Page 1 11 CHAPTER 2 Statistics, Probability and Noise Statistics and probability are used in Digital Signal Processing to characterize signals and the processes that generate them. For example, a primary use of DSP is to reduce interference, noise, and other undesirable components in acquired data. These may be an inherent part of the signal being measured, arise from imperfections in the data acquisition system, or be introduced as an unavoidable byproduct of some DSP operation. Statistics and probability allow these disruptive features to be measured and classified, the first step in developing strategies to remove the offending components. This chapter introduces the most important concepts in statistics and probability, with emphasis on how they apply to acquired signals. Signal and Graph Terminology A signal is a description of how one parameter is related to another parameter. For example, the most common type of signal in analog electronics is a voltage that varies with time . Since both parameters can assume a continuous range of values, we will call this a continuous signal . In comparison, passing this signal through an analog-to-digital converter forces each of the two parameters to be quantized . For instance, imagine the conversion being done with 12 bits at a sampling rate of 1000 samples per second. The voltage is curtailed to 4096 (2 12 ) possible binary levels, and the time is only defined at one millisecond increments. Signals formed from parameters that are quantized in this manner are said to be discrete signals or digitized signals . For the most part, continuous signals exist in nature, while discrete signals exist inside computers (although you can find exceptions to both cases). It is also possible to have signals where one parameter is continuous and the other is discrete. Since these mixed signals are quite uncommon, they do not have special names given to them, and the nature of the two parameters must be explicitly stated. Figure 2-1 shows two discrete signals, such as might be acquired with a digital data acquisition system. The vertical axis may represent voltage, light Page 2 11 CHAPTER 2 Statistics, Probability and Noise Statistics and probability are used in Digital Signal Processing to characterize signals and the processes that generate them. For example, a primary use of DSP is to reduce interference, noise, and other undesirable components in acquired data. These may be an inherent part of the signal being measured, arise from imperfections in the data acquisition system, or be introduced as an unavoidable byproduct of some DSP operation. Statistics and probability allow these disruptive features to be measured and classified, the first step in developing strategies to remove the offending components. This chapter introduces the most important concepts in statistics and probability, with emphasis on how they apply to acquired signals. Signal and Graph Terminology A signal is a description of how one parameter is related to another parameter. For example, the most common type of signal in analog electronics is a voltage that varies with time . Since both parameters can assume a continuous range of values, we will call this a continuous signal . In comparison, passing this signal through an analog-to-digital converter forces each of the two parameters to be quantized . For instance, imagine the conversion being done with 12 bits at a sampling rate of 1000 samples per second. The voltage is curtailed to 4096 (2 12 ) possible binary levels, and the time is only defined at one millisecond increments. Signals formed from parameters that are quantized in this manner are said to be discrete signals or digitized signals . For the most part, continuous signals exist in nature, while discrete signals exist inside computers (although you can find exceptions to both cases). It is also possible to have signals where one parameter is continuous and the other is discrete. Since these mixed signals are quite uncommon, they do not have special names given to them, and the nature of the two parameters must be explicitly stated. Figure 2-1 shows two discrete signals, such as might be acquired with a digital data acquisition system. The vertical axis may represent voltage, light The Scientist and Engineer's Guide to Digital Signal Processing 12 intensity, sound pressure, or an infinite number of other parameters. Since we don't know what it represents in this particular case, we will give it the generic label: amplitude . This parameter is also called several other names: the y- axis , the dependent variable , the range , and the ordinate . The horizontal axis represents the other parameter of the signal, going by such names as: the x-axis , the independent variable , the domain , and the abscissa . Time is the most common parameter to appear on the horizontal axis of acquired signals; however, other parameters are used in specific applications. For example, a geophysicist might acquire measurements of rock density at equally spaced distances along the surface of the earth. To keep things general, we will simply label the horizontal axis: sample number . If this were a continuous signal, another label would have to be used, such as: time , distance , x , etc. The two parameters that form a signal are generally not interchangeable. The parameter on the y-axis (the dependent variable) is said to be a function of the parameter on the x-axis (the independent variable). In other words, the independent variable describes how or when each sample is taken, while the dependent variable is the actual measurement. Given a specific value on the x-axis, we can always find the corresponding value on the y-axis, but usually not the other way around. Pay particular attention to the word: domain , a very widely used term in DSP. For instance, a signal that uses time as the independent variable (i.e., the parameter on the horizontal axis), is said to be in the time domain . Another common signal in DSP uses frequency as the independent variable, resulting in the term, frequency domain . Likewise, signals that use distance as the independent parameter are said to be in the spatial domain (distance is a measure of space). The type of parameter on the horizontal axis is the domain of the signal; it's that simple. What if the x-axis is labeled with something very generic, such as sample number ? Authors commonly refer to these signals as being in the time domain. This is because sampling at equal intervals of time is the most common way of obtaining signals, and they don't have anything more specific to call it. Although the signals in Fig. 2-1 are discrete, they are displayed in this figure as continuous lines. This is because there are too many samples to be distinguishable if they were displayed as individual markers. In graphs that portray shorter signals, say less than 100 samples, the individual markers are usually shown. Continuous lines may or may not be drawn to connect the markers, depending on how the author wants you to view the data. For instance, a continuous line could imply what is happening between samples, or simply be an aid to help the reader's eye follow a trend in noisy data. The point is, examine the labeling of the horizontal axis to find if you are working with a discrete or continuous signal. Don't rely on an illustrator's ability to draw dots. The variable, N , is widely used in DSP to represent the total number of samples in a signal. For example, for the signals in Fig. 2-1. To N ' 512 Page 3 11 CHAPTER 2 Statistics, Probability and Noise Statistics and probability are used in Digital Signal Processing to characterize signals and the processes that generate them. For example, a primary use of DSP is to reduce interference, noise, and other undesirable components in acquired data. These may be an inherent part of the signal being measured, arise from imperfections in the data acquisition system, or be introduced as an unavoidable byproduct of some DSP operation. Statistics and probability allow these disruptive features to be measured and classified, the first step in developing strategies to remove the offending components. This chapter introduces the most important concepts in statistics and probability, with emphasis on how they apply to acquired signals. Signal and Graph Terminology A signal is a description of how one parameter is related to another parameter. For example, the most common type of signal in analog electronics is a voltage that varies with time . Since both parameters can assume a continuous range of values, we will call this a continuous signal . In comparison, passing this signal through an analog-to-digital converter forces each of the two parameters to be quantized . For instance, imagine the conversion being done with 12 bits at a sampling rate of 1000 samples per second. The voltage is curtailed to 4096 (2 12 ) possible binary levels, and the time is only defined at one millisecond increments. Signals formed from parameters that are quantized in this manner are said to be discrete signals or digitized signals . For the most part, continuous signals exist in nature, while discrete signals exist inside computers (although you can find exceptions to both cases). It is also possible to have signals where one parameter is continuous and the other is discrete. Since these mixed signals are quite uncommon, they do not have special names given to them, and the nature of the two parameters must be explicitly stated. Figure 2-1 shows two discrete signals, such as might be acquired with a digital data acquisition system. The vertical axis may represent voltage, light The Scientist and Engineer's Guide to Digital Signal Processing 12 intensity, sound pressure, or an infinite number of other parameters. Since we don't know what it represents in this particular case, we will give it the generic label: amplitude . This parameter is also called several other names: the y- axis , the dependent variable , the range , and the ordinate . The horizontal axis represents the other parameter of the signal, going by such names as: the x-axis , the independent variable , the domain , and the abscissa . Time is the most common parameter to appear on the horizontal axis of acquired signals; however, other parameters are used in specific applications. For example, a geophysicist might acquire measurements of rock density at equally spaced distances along the surface of the earth. To keep things general, we will simply label the horizontal axis: sample number . If this were a continuous signal, another label would have to be used, such as: time , distance , x , etc. The two parameters that form a signal are generally not interchangeable. The parameter on the y-axis (the dependent variable) is said to be a function of the parameter on the x-axis (the independent variable). In other words, the independent variable describes how or when each sample is taken, while the dependent variable is the actual measurement. Given a specific value on the x-axis, we can always find the corresponding value on the y-axis, but usually not the other way around. Pay particular attention to the word: domain , a very widely used term in DSP. For instance, a signal that uses time as the independent variable (i.e., the parameter on the horizontal axis), is said to be in the time domain . Another common signal in DSP uses frequency as the independent variable, resulting in the term, frequency domain . Likewise, signals that use distance as the independent parameter are said to be in the spatial domain (distance is a measure of space). The type of parameter on the horizontal axis is the domain of the signal; it's that simple. What if the x-axis is labeled with something very generic, such as sample number ? Authors commonly refer to these signals as being in the time domain. This is because sampling at equal intervals of time is the most common way of obtaining signals, and they don't have anything more specific to call it. Although the signals in Fig. 2-1 are discrete, they are displayed in this figure as continuous lines. This is because there are too many samples to be distinguishable if they were displayed as individual markers. In graphs that portray shorter signals, say less than 100 samples, the individual markers are usually shown. Continuous lines may or may not be drawn to connect the markers, depending on how the author wants you to view the data. For instance, a continuous line could imply what is happening between samples, or simply be an aid to help the reader's eye follow a trend in noisy data. The point is, examine the labeling of the horizontal axis to find if you are working with a discrete or continuous signal. Don't rely on an illustrator's ability to draw dots. The variable, N , is widely used in DSP to represent the total number of samples in a signal. For example, for the signals in Fig. 2-1. To N ' 512 Chapter 2- Statistics, Probability and Noise 13 Sample number 0 64 128 192 256 320 384 448 512 -4 -2 0 2 4 6 8 511 a. Mean = 0.5, F = 1 Sample number 0 64 128 192 256 320 384 448 512 -4 -2 0 2 4 6 8 511 b. Mean = 3.0, F = 0.2 Amplitude Amplitude FIGURE 2-1 Examples of two digitized signals with different means and standard deviations . EQUATION 2-1 Calculation of a signal's mean. The signal is contained in x 0 through x N -1 , i is an index that runs through these values, and µ is the mean. µ ' 1 N j N & 1 i ' 0 x i keep the data organized, each sample is assigned a sample number or index . These are the numbers that appear along the horizontal axis. Two notations for assigning sample numbers are commonly used. In the first notation, the sample indexes run from 1 to N (e.g., 1 to 512). In the second notation, the sample indexes run from 0 to (e.g., 0 to 511). N & 1 Mathematicians often use the first method (1 to N ), while those in DSP commonly uses the second (0 to ). In this book, we will use the second N & 1 notation. Don't dismiss this as a trivial problem. It will confuse you sometime during your career. Look out for it! Mean and Standard Deviation The mean , indicated by µ (a lower case Greek mu ), is the statistician's jargon for the average value of a signal. It is found just as you would expect: add all of the samples together, and divide by N . It looks like this in mathematical form: In words, sum the values in the signal, , by letting the index, i , run from 0 x i to . Then finish the calculation by dividing the sum by N . This is N & 1 identical to the equation: . If you are not already µ ' ( x 0 % x 1 % x 2 % þ% x N & 1 ) / N familiar with E (upper case Greek sigma ) being used to indicate summation , study these equations carefully, and compare them with the computer program in Table 2-1. Summations of this type are abundant in DSP, and you need to understand this notation fully. Page 4 11 CHAPTER 2 Statistics, Probability and Noise Statistics and probability are used in Digital Signal Processing to characterize signals and the processes that generate them. For example, a primary use of DSP is to reduce interference, noise, and other undesirable components in acquired data. These may be an inherent part of the signal being measured, arise from imperfections in the data acquisition system, or be introduced as an unavoidable byproduct of some DSP operation. Statistics and probability allow these disruptive features to be measured and classified, the first step in developing strategies to remove the offending components. This chapter introduces the most important concepts in statistics and probability, with emphasis on how they apply to acquired signals. Signal and Graph Terminology A signal is a description of how one parameter is related to another parameter. For example, the most common type of signal in analog electronics is a voltage that varies with time . Since both parameters can assume a continuous range of values, we will call this a continuous signal . In comparison, passing this signal through an analog-to-digital converter forces each of the two parameters to be quantized . For instance, imagine the conversion being done with 12 bits at a sampling rate of 1000 samples per second. The voltage is curtailed to 4096 (2 12 ) possible binary levels, and the time is only defined at one millisecond increments. Signals formed from parameters that are quantized in this manner are said to be discrete signals or digitized signals . For the most part, continuous signals exist in nature, while discrete signals exist inside computers (although you can find exceptions to both cases). It is also possible to have signals where one parameter is continuous and the other is discrete. Since these mixed signals are quite uncommon, they do not have special names given to them, and the nature of the two parameters must be explicitly stated. Figure 2-1 shows two discrete signals, such as might be acquired with a digital data acquisition system. The vertical axis may represent voltage, light The Scientist and Engineer's Guide to Digital Signal Processing 12 intensity, sound pressure, or an infinite number of other parameters. Since we don't know what it represents in this particular case, we will give it the generic label: amplitude . This parameter is also called several other names: the y- axis , the dependent variable , the range , and the ordinate . The horizontal axis represents the other parameter of the signal, going by such names as: the x-axis , the independent variable , the domain , and the abscissa . Time is the most common parameter to appear on the horizontal axis of acquired signals; however, other parameters are used in specific applications. For example, a geophysicist might acquire measurements of rock density at equally spaced distances along the surface of the earth. To keep things general, we will simply label the horizontal axis: sample number . If this were a continuous signal, another label would have to be used, such as: time , distance , x , etc. The two parameters that form a signal are generally not interchangeable. The parameter on the y-axis (the dependent variable) is said to be a function of the parameter on the x-axis (the independent variable). In other words, the independent variable describes how or when each sample is taken, while the dependent variable is the actual measurement. Given a specific value on the x-axis, we can always find the corresponding value on the y-axis, but usually not the other way around. Pay particular attention to the word: domain , a very widely used term in DSP. For instance, a signal that uses time as the independent variable (i.e., the parameter on the horizontal axis), is said to be in the time domain . Another common signal in DSP uses frequency as the independent variable, resulting in the term, frequency domain . Likewise, signals that use distance as the independent parameter are said to be in the spatial domain (distance is a measure of space). The type of parameter on the horizontal axis is the domain of the signal; it's that simple. What if the x-axis is labeled with something very generic, such as sample number ? Authors commonly refer to these signals as being in the time domain. This is because sampling at equal intervals of time is the most common way of obtaining signals, and they don't have anything more specific to call it. Although the signals in Fig. 2-1 are discrete, they are displayed in this figure as continuous lines. This is because there are too many samples to be distinguishable if they were displayed as individual markers. In graphs that portray shorter signals, say less than 100 samples, the individual markers are usually shown. Continuous lines may or may not be drawn to connect the markers, depending on how the author wants you to view the data. For instance, a continuous line could imply what is happening between samples, or simply be an aid to help the reader's eye follow a trend in noisy data. The point is, examine the labeling of the horizontal axis to find if you are working with a discrete or continuous signal. Don't rely on an illustrator's ability to draw dots. The variable, N , is widely used in DSP to represent the total number of samples in a signal. For example, for the signals in Fig. 2-1. To N ' 512 Chapter 2- Statistics, Probability and Noise 13 Sample number 0 64 128 192 256 320 384 448 512 -4 -2 0 2 4 6 8 511 a. Mean = 0.5, F = 1 Sample number 0 64 128 192 256 320 384 448 512 -4 -2 0 2 4 6 8 511 b. Mean = 3.0, F = 0.2 Amplitude Amplitude FIGURE 2-1 Examples of two digitized signals with different means and standard deviations . EQUATION 2-1 Calculation of a signal's mean. The signal is contained in x 0 through x N -1 , i is an index that runs through these values, and µ is the mean. µ ' 1 N j N & 1 i ' 0 x i keep the data organized, each sample is assigned a sample number or index . These are the numbers that appear along the horizontal axis. Two notations for assigning sample numbers are commonly used. In the first notation, the sample indexes run from 1 to N (e.g., 1 to 512). In the second notation, the sample indexes run from 0 to (e.g., 0 to 511). N & 1 Mathematicians often use the first method (1 to N ), while those in DSP commonly uses the second (0 to ). In this book, we will use the second N & 1 notation. Don't dismiss this as a trivial problem. It will confuse you sometime during your career. Look out for it! Mean and Standard Deviation The mean , indicated by µ (a lower case Greek mu ), is the statistician's jargon for the average value of a signal. It is found just as you would expect: add all of the samples together, and divide by N . It looks like this in mathematical form: In words, sum the values in the signal, , by letting the index, i , run from 0 x i to . Then finish the calculation by dividing the sum by N . This is N & 1 identical to the equation: . If you are not already µ ' ( x 0 % x 1 % x 2 % þ% x N & 1 ) / N familiar with E (upper case Greek sigma ) being used to indicate summation , study these equations carefully, and compare them with the computer program in Table 2-1. Summations of this type are abundant in DSP, and you need to understand this notation fully. The Scientist and Engineer's Guide to Digital Signal Processing 14 EQUATION 2-2 Calculation of the standard deviation of a signal. The signal is stored in , µ is the x i mean found from Eq. 2-1, N is the number of samples, and is the standard deviation. s F 2 ' 1 N & 1 j N & 1 i ' 0 ( x i & µ ) 2 In electronics, the mean is commonly called the DC (direct current) value. Likewise, AC (alternating current) refers to how the signal fluctuates around the mean value. If the signal is a simple repetitive waveform, such as a sine or square wave, its excursions can be described by its peak-to-peak amplitude. Unfortunately, most acquired signals do not show a well defined peak-to-peak value, but have a random nature, such as the signals in Fig. 2-1. A more generalized method must be used in these cases, called the standard deviation , denoted by F F (a lower case Greek sigma ). As a starting point, the expression, , describes how far the sample * x i & µ * i th deviates (differs) from the mean. The average deviation of a signal is found by summing the deviations of all the individual samples, and then dividing by the number of samples, N. Notice that we take the absolute value of each deviation before the summation; otherwise the positive and negative terms would average to zero. The average deviation provides a single number representing the typical distance that the samples are from the mean. While convenient and straightforward, the average deviation is almost never used in statistics. This is because it doesn't fit well with the physics of how signals operate. In most cases, the important parameter is not the deviation from the mean, but the power represented by the deviation from the mean. For example, when random noise signals combine in an electronic circuit, the resultant noise is equal to the combined power of the individual signals, not their combined amplitude . The standard deviation is similar to the average deviation , except the averaging is done with power instead of amplitude. This is achieved by squaring each of the deviations before taking the average (remember, power % voltage 2 ). To finish, the square root is taken to compensate for the initial squaring. In equation form, the standard deviation is calculated: In the alternative notation: . F' ( x 0 & µ ) 2 % ( x 1 & µ ) 2 % þ% ( x N & 1 & µ ) 2 / ( N & 1 ) Notice that the average is carried out by dividing by instead of N. This N & 1 is a subtle feature of the equation that will be discussed in the next section. The term, F 2 , occurs frequently in statistics and is given the name variance. The standard deviation is a measure of how far the signal fluctuates from the mean. The variance represents the power of this fluctuation. Another term you should become familiar with is the rms (root-mean-square) value, frequently used in electronics. By definition, the standard deviation only measures the AC portion of a signal, while the rms value measures both the AC and DC components. If a signal has no DC component, its rms value is identical to its standard deviation. Figure 2-2 shows the relationship between the standard deviation and the peak-to-peak value of several common waveforms. Page 5 11 CHAPTER 2 Statistics, Probability and Noise Statistics and probability are used in Digital Signal Processing to characterize signals and the processes that generate them. For example, a primary use of DSP is to reduce interference, noise, and other undesirable components in acquired data. These may be an inherent part of the signal being measured, arise from imperfections in the data acquisition system, or be introduced as an unavoidable byproduct of some DSP operation. Statistics and probability allow these disruptive features to be measured and classified, the first step in developing strategies to remove the offending components. This chapter introduces the most important concepts in statistics and probability, with emphasis on how they apply to acquired signals. Signal and Graph Terminology A signal is a description of how one parameter is related to another parameter. For example, the most common type of signal in analog electronics is a voltage that varies with time . Since both parameters can assume a continuous range of values, we will call this a continuous signal . In comparison, passing this signal through an analog-to-digital converter forces each of the two parameters to be quantized . For instance, imagine the conversion being done with 12 bits at a sampling rate of 1000 samples per second. The voltage is curtailed to 4096 (2 12 ) possible binary levels, and the time is only defined at one millisecond increments. Signals formed from parameters that are quantized in this manner are said to be discrete signals or digitized signals . For the most part, continuous signals exist in nature, while discrete signals exist inside computers (although you can find exceptions to both cases). It is also possible to have signals where one parameter is continuous and the other is discrete. Since these mixed signals are quite uncommon, they do not have special names given to them, and the nature of the two parameters must be explicitly stated. Figure 2-1 shows two discrete signals, such as might be acquired with a digital data acquisition system. The vertical axis may represent voltage, light The Scientist and Engineer's Guide to Digital Signal Processing 12 intensity, sound pressure, or an infinite number of other parameters. Since we don't know what it represents in this particular case, we will give it the generic label: amplitude . This parameter is also called several other names: the y- axis , the dependent variable , the range , and the ordinate . The horizontal axis represents the other parameter of the signal, going by such names as: the x-axis , the independent variable , the domain , and the abscissa . Time is the most common parameter to appear on the horizontal axis of acquired signals; however, other parameters are used in specific applications. For example, a geophysicist might acquire measurements of rock density at equally spaced distances along the surface of the earth. To keep things general, we will simply label the horizontal axis: sample number . If this were a continuous signal, another label would have to be used, such as: time , distance , x , etc. The two parameters that form a signal are generally not interchangeable. The parameter on the y-axis (the dependent variable) is said to be a function of the parameter on the x-axis (the independent variable). In other words, the independent variable describes how or when each sample is taken, while the dependent variable is the actual measurement. Given a specific value on the x-axis, we can always find the corresponding value on the y-axis, but usually not the other way around. Pay particular attention to the word: domain , a very widely used term in DSP. For instance, a signal that uses time as the independent variable (i.e., the parameter on the horizontal axis), is said to be in the time domain . Another common signal in DSP uses frequency as the independent variable, resulting in the term, frequency domain . Likewise, signals that use distance as the independent parameter are said to be in the spatial domain (distance is a measure of space). The type of parameter on the horizontal axis is the domain of the signal; it's that simple. What if the x-axis is labeled with something very generic, such as sample number ? Authors commonly refer to these signals as being in the time domain. This is because sampling at equal intervals of time is the most common way of obtaining signals, and they don't have anything more specific to call it. Although the signals in Fig. 2-1 are discrete, they are displayed in this figure as continuous lines. This is because there are too many samples to be distinguishable if they were displayed as individual markers. In graphs that portray shorter signals, say less than 100 samples, the individual markers are usually shown. Continuous lines may or may not be drawn to connect the markers, depending on how the author wants you to view the data. For instance, a continuous line could imply what is happening between samples, or simply be an aid to help the reader's eye follow a trend in noisy data. The point is, examine the labeling of the horizontal axis to find if you are working with a discrete or continuous signal. Don't rely on an illustrator's ability to draw dots. The variable, N , is widely used in DSP to represent the total number of samples in a signal. For example, for the signals in Fig. 2-1. To N ' 512 Chapter 2- Statistics, Probability and Noise 13 Sample number 0 64 128 192 256 320 384 448 512 -4 -2 0 2 4 6 8 511 a. Mean = 0.5, F = 1 Sample number 0 64 128 192 256 320 384 448 512 -4 -2 0 2 4 6 8 511 b. Mean = 3.0, F = 0.2 Amplitude Amplitude FIGURE 2-1 Examples of two digitized signals with different means and standard deviations . EQUATION 2-1 Calculation of a signal's mean. The signal is contained in x 0 through x N -1 , i is an index that runs through these values, and µ is the mean. µ ' 1 N j N & 1 i ' 0 x i keep the data organized, each sample is assigned a sample number or index . These are the numbers that appear along the horizontal axis. Two notations for assigning sample numbers are commonly used. In the first notation, the sample indexes run from 1 to N (e.g., 1 to 512). In the second notation, the sample indexes run from 0 to (e.g., 0 to 511). N & 1 Mathematicians often use the first method (1 to N ), while those in DSP commonly uses the second (0 to ). In this book, we will use the second N & 1 notation. Don't dismiss this as a trivial problem. It will confuse you sometime during your career. Look out for it! Mean and Standard Deviation The mean , indicated by µ (a lower case Greek mu ), is the statistician's jargon for the average value of a signal. It is found just as you would expect: add all of the samples together, and divide by N . It looks like this in mathematical form: In words, sum the values in the signal, , by letting the index, i , run from 0 x i to . Then finish the calculation by dividing the sum by N . This is N & 1 identical to the equation: . If you are not already µ ' ( x 0 % x 1 % x 2 % þ% x N & 1 ) / N familiar with E (upper case Greek sigma ) being used to indicate summation , study these equations carefully, and compare them with the computer program in Table 2-1. Summations of this type are abundant in DSP, and you need to understand this notation fully. The Scientist and Engineer's Guide to Digital Signal Processing 14 EQUATION 2-2 Calculation of the standard deviation of a signal. The signal is stored in , µ is the x i mean found from Eq. 2-1, N is the number of samples, and is the standard deviation. s F 2 ' 1 N & 1 j N & 1 i ' 0 ( x i & µ ) 2 In electronics, the mean is commonly called the DC (direct current) value. Likewise, AC (alternating current) refers to how the signal fluctuates around the mean value. If the signal is a simple repetitive waveform, such as a sine or square wave, its excursions can be described by its peak-to-peak amplitude. Unfortunately, most acquired signals do not show a well defined peak-to-peak value, but have a random nature, such as the signals in Fig. 2-1. A more generalized method must be used in these cases, called the standard deviation , denoted by F F (a lower case Greek sigma ). As a starting point, the expression, , describes how far the sample * x i & µ * i th deviates (differs) from the mean. The average deviation of a signal is found by summing the deviations of all the individual samples, and then dividing by the number of samples, N. Notice that we take the absolute value of each deviation before the summation; otherwise the positive and negative terms would average to zero. The average deviation provides a single number representing the typical distance that the samples are from the mean. While convenient and straightforward, the average deviation is almost never used in statistics. This is because it doesn't fit well with the physics of how signals operate. In most cases, the important parameter is not the deviation from the mean, but the power represented by the deviation from the mean. For example, when random noise signals combine in an electronic circuit, the resultant noise is equal to the combined power of the individual signals, not their combined amplitude . The standard deviation is similar to the average deviation , except the averaging is done with power instead of amplitude. This is achieved by squaring each of the deviations before taking the average (remember, power % voltage 2 ). To finish, the square root is taken to compensate for the initial squaring. In equation form, the standard deviation is calculated: In the alternative notation: . F' ( x 0 & µ ) 2 % ( x 1 & µ ) 2 % þ% ( x N & 1 & µ ) 2 / ( N & 1 ) Notice that the average is carried out by dividing by instead of N. This N & 1 is a subtle feature of the equation that will be discussed in the next section. The term, F 2 , occurs frequently in statistics and is given the name variance. The standard deviation is a measure of how far the signal fluctuates from the mean. The variance represents the power of this fluctuation. Another term you should become familiar with is the rms (root-mean-square) value, frequently used in electronics. By definition, the standard deviation only measures the AC portion of a signal, while the rms value measures both the AC and DC components. If a signal has no DC component, its rms value is identical to its standard deviation. Figure 2-2 shows the relationship between the standard deviation and the peak-to-peak value of several common waveforms. Chapter 2- Statistics, Probability and Noise 15 Vpp F Vpp F Vpp F Vpp F FIGURE 2-2 Ratio of the peak-to-peak amplitude to the standard deviation for several common waveforms. For the square wave, this ratio is 2; for the triangle wave it is ; for the sine wave it is . While random 12 ' 3 . 46 2 2 ' 2 . 83 noise has no exact peak-to-peak value, it is approximately 6 to 8 times the standard deviation. a. Square Wave, Vpp = 2 F c. Sine wave, Vpp = 2 2 F d. Random noise, Vpp . 6-8 F b. Triangle wave, Vpp = 12 F 100 CALCULATION OF THE MEAN AND STANDARD DEVIATION 110 ' 120 DIM X[511] 'The signal is held in X[0] to X[511] 130 N% = 512 'N% is the number of points in the signal 140 ' 150 GOSUB XXXX 'Mythical subroutine that loads the signal into X[ ] 160 ' 170 MEAN = 0 'Find the mean via Eq. 2-1 180 FOR I% = 0 TO N%-1 190 MEAN = MEAN + X[I%] 200 NEXT I% 210 MEAN = MEAN/N% 220 ' 230 VARIANCE = 0 'Find the standard deviation via Eq. 2-2 240 FOR I% = 0 TO N%-1 250 VARIANCE = VARIANCE + ( X[I%] - MEAN )^2 260 NEXT I% 270 VARIANCE = VARIANCE/(N%-1) 280 SD = SQR(VARIANCE) 290 ' 300 PRINT MEAN SD 'Print the calculated mean and standard deviation 310 ' 320 END TABLE 2-1 Table 2-1 lists a computer routine for calculating the mean and standard deviation using Eqs. 2-1 and 2-2. The programs in this book are intended to convey algorithms in the most straightforward way; all other factors are treated as secondary. Good programming techniques are disregarded if it makes the program logic more clear. For instance: a simplified version of BASIC is used, line numbers are included, the only control structure allowed is the FOR-NEXT loop, there are no I/O statements, etc. Think of these programs as an alternative way of understanding the equations usedRead More

Offer running on EduRev: __Apply code STAYHOME200__ to get INR 200 off on our premium plan EduRev Infinity!