Page 1 4. Sampling 4.1. Concepts of Sampling Many variables in Civil engineering are spatially distributed. For example concentration of pollutants, variation of material properties such as strength and stiffness in the case of concrete and soils are spatially distributed. The purpose of sampling is to obtain estimates of population parameters (e.g. means, variances, covariance’s) to characterize the entire population distribution without observing and measuring every element in the sampled population. Sampling theory for spatial processes principally involves evaluation of estimator’s sampling distributions and confidence limits. A very good introduction to these methods and the uses and advantages of sampling is provided by Cochran (1977) and Beacher and Christian (2003). An estimate is the realization of a particular sample statistic for a specific set of sample observations. Estimates are not exact and uncertainty is reflected in the variance of their distribution about the true parameter value they estimate. This variance is, in turn, a function of both the sampling plan and the sampled population. By knowing this variance and making assumptions about the distribution, shape, confidence limits on true population parameters can be set. A sampling plan is a program of action for collecting data from a sampled population. Common plans are grouped into many types: for example, simple random, systematic, stratified random, cluster, traverse, line intersects, and so on. In deciding among plans or in designing a specific program once the type plan has been chosen, one attempts to obtain the highest precision for a fixed sampling cost or the lowest sampling cost for a fixed precision or a specified confidence interval. 4.2. Common Spatial Sampling Plans Statistical sampling is a common activity in many human enterprises, from the national census, to market research, to scientific research. As a result, common situations are encountered in many different endeavors, and a family of sampling plans has grown up to 1 Page 2 4. Sampling 4.1. Concepts of Sampling Many variables in Civil engineering are spatially distributed. For example concentration of pollutants, variation of material properties such as strength and stiffness in the case of concrete and soils are spatially distributed. The purpose of sampling is to obtain estimates of population parameters (e.g. means, variances, covariance’s) to characterize the entire population distribution without observing and measuring every element in the sampled population. Sampling theory for spatial processes principally involves evaluation of estimator’s sampling distributions and confidence limits. A very good introduction to these methods and the uses and advantages of sampling is provided by Cochran (1977) and Beacher and Christian (2003). An estimate is the realization of a particular sample statistic for a specific set of sample observations. Estimates are not exact and uncertainty is reflected in the variance of their distribution about the true parameter value they estimate. This variance is, in turn, a function of both the sampling plan and the sampled population. By knowing this variance and making assumptions about the distribution, shape, confidence limits on true population parameters can be set. A sampling plan is a program of action for collecting data from a sampled population. Common plans are grouped into many types: for example, simple random, systematic, stratified random, cluster, traverse, line intersects, and so on. In deciding among plans or in designing a specific program once the type plan has been chosen, one attempts to obtain the highest precision for a fixed sampling cost or the lowest sampling cost for a fixed precision or a specified confidence interval. 4.2. Common Spatial Sampling Plans Statistical sampling is a common activity in many human enterprises, from the national census, to market research, to scientific research. As a result, common situations are encountered in many different endeavors, and a family of sampling plans has grown up to 1 handle these situations. Simple random sampling, systematic sampling, stratified random sampling, and cluster sampling are considered in the following section. 4.2.1. Simple random sampling The characteristic property of simple random sampling is that individual are chosen at random from the sampled population, and each element of population has an equal probability of being observed. An unbiased estimator of the population mean from a simple random x={x 1 ………..x n } is the sample mean --------------------------------(1) This estimator has sampling variance. ? = = n i i x n x 1 1 N n N n x Var - = 2 ) ( s --------------------------------(2) where s 2 is the (true) variance of the sampled population and N is the total sampled population size. The term (N-n)/N is called the finite population factor, which for n less than about 10% of N, can safety be ignored. However, since s 2 is usually unknown. it is estimated by the sample variance ? = - - = n i i x x n s 1 2 2 ) ( 1 1 --------------------------------(3) in which the denominator is taken as n-1 rather than n. reflecting the loss of a degree- of- freedom due to estimating the mean from the same data. The estimator is unbiased but does not have minimum variance. The only choice (i.e. allocation) to be made in simple random sampling is the sample size n. Since the sampling variance of the mean is inversely proportional to sample size. ( ) 1 - ? n x Var , a given estimator precision can be obtained by adjusting the sample size, if s is known or assumed. A sampling plan can be optimized for total cost by assuming some relationship between () x Var and cost in 2 Page 3 4. Sampling 4.1. Concepts of Sampling Many variables in Civil engineering are spatially distributed. For example concentration of pollutants, variation of material properties such as strength and stiffness in the case of concrete and soils are spatially distributed. The purpose of sampling is to obtain estimates of population parameters (e.g. means, variances, covariance’s) to characterize the entire population distribution without observing and measuring every element in the sampled population. Sampling theory for spatial processes principally involves evaluation of estimator’s sampling distributions and confidence limits. A very good introduction to these methods and the uses and advantages of sampling is provided by Cochran (1977) and Beacher and Christian (2003). An estimate is the realization of a particular sample statistic for a specific set of sample observations. Estimates are not exact and uncertainty is reflected in the variance of their distribution about the true parameter value they estimate. This variance is, in turn, a function of both the sampling plan and the sampled population. By knowing this variance and making assumptions about the distribution, shape, confidence limits on true population parameters can be set. A sampling plan is a program of action for collecting data from a sampled population. Common plans are grouped into many types: for example, simple random, systematic, stratified random, cluster, traverse, line intersects, and so on. In deciding among plans or in designing a specific program once the type plan has been chosen, one attempts to obtain the highest precision for a fixed sampling cost or the lowest sampling cost for a fixed precision or a specified confidence interval. 4.2. Common Spatial Sampling Plans Statistical sampling is a common activity in many human enterprises, from the national census, to market research, to scientific research. As a result, common situations are encountered in many different endeavors, and a family of sampling plans has grown up to 1 handle these situations. Simple random sampling, systematic sampling, stratified random sampling, and cluster sampling are considered in the following section. 4.2.1. Simple random sampling The characteristic property of simple random sampling is that individual are chosen at random from the sampled population, and each element of population has an equal probability of being observed. An unbiased estimator of the population mean from a simple random x={x 1 ………..x n } is the sample mean --------------------------------(1) This estimator has sampling variance. ? = = n i i x n x 1 1 N n N n x Var - = 2 ) ( s --------------------------------(2) where s 2 is the (true) variance of the sampled population and N is the total sampled population size. The term (N-n)/N is called the finite population factor, which for n less than about 10% of N, can safety be ignored. However, since s 2 is usually unknown. it is estimated by the sample variance ? = - - = n i i x x n s 1 2 2 ) ( 1 1 --------------------------------(3) in which the denominator is taken as n-1 rather than n. reflecting the loss of a degree- of- freedom due to estimating the mean from the same data. The estimator is unbiased but does not have minimum variance. The only choice (i.e. allocation) to be made in simple random sampling is the sample size n. Since the sampling variance of the mean is inversely proportional to sample size. ( ) 1 - ? n x Var , a given estimator precision can be obtained by adjusting the sample size, if s is known or assumed. A sampling plan can be optimized for total cost by assuming some relationship between () x Var and cost in 2 construction or design. A common assumption is that this cost is proportional to the square root of the variance, usually called the standard error of the mean, ( ) x Var x 2 1 = s . It is usually assumed that the estimates of y and Y are normally distributed about the corresponding population values. If the assumption holds, lower and upper confidence limits for the population mean and total mean are as follows: Mean: 1 L ts Yy f n =- - , 1 U ts Yy f n =+- Total: 1 L tNs YNy f n =- - , 1 U tNs YNy f n = +- The symbol t is the value of the normal deviate corresponding to the desired confidence probability. The most common values are tabulated below: Confidence probability (%) 50 80 90 95 99 Normal deviate, t 0.67 1.28 1.641.96 2.58 If the sample size is less than 60, the percentage points may be taken from Student’s t table with (n-1) degrees of freedom, these being the degrees of freedom in the estimated s 2 . The t distribution holds exactly only if the observations y i are themselves normally distributed and N is infinite. Moderate departures from normality do not affect it greatly. For small samples with very skew distributions, special methods are needed. An example of the application is as follows. Example. In a site, the number of borehole data sheets to characterize the substrata to obtain design parameters is 676. In each borehole data, 42 entries reflecting the various characteristics 3 Page 4 4. Sampling 4.1. Concepts of Sampling Many variables in Civil engineering are spatially distributed. For example concentration of pollutants, variation of material properties such as strength and stiffness in the case of concrete and soils are spatially distributed. The purpose of sampling is to obtain estimates of population parameters (e.g. means, variances, covariance’s) to characterize the entire population distribution without observing and measuring every element in the sampled population. Sampling theory for spatial processes principally involves evaluation of estimator’s sampling distributions and confidence limits. A very good introduction to these methods and the uses and advantages of sampling is provided by Cochran (1977) and Beacher and Christian (2003). An estimate is the realization of a particular sample statistic for a specific set of sample observations. Estimates are not exact and uncertainty is reflected in the variance of their distribution about the true parameter value they estimate. This variance is, in turn, a function of both the sampling plan and the sampled population. By knowing this variance and making assumptions about the distribution, shape, confidence limits on true population parameters can be set. A sampling plan is a program of action for collecting data from a sampled population. Common plans are grouped into many types: for example, simple random, systematic, stratified random, cluster, traverse, line intersects, and so on. In deciding among plans or in designing a specific program once the type plan has been chosen, one attempts to obtain the highest precision for a fixed sampling cost or the lowest sampling cost for a fixed precision or a specified confidence interval. 4.2. Common Spatial Sampling Plans Statistical sampling is a common activity in many human enterprises, from the national census, to market research, to scientific research. As a result, common situations are encountered in many different endeavors, and a family of sampling plans has grown up to 1 handle these situations. Simple random sampling, systematic sampling, stratified random sampling, and cluster sampling are considered in the following section. 4.2.1. Simple random sampling The characteristic property of simple random sampling is that individual are chosen at random from the sampled population, and each element of population has an equal probability of being observed. An unbiased estimator of the population mean from a simple random x={x 1 ………..x n } is the sample mean --------------------------------(1) This estimator has sampling variance. ? = = n i i x n x 1 1 N n N n x Var - = 2 ) ( s --------------------------------(2) where s 2 is the (true) variance of the sampled population and N is the total sampled population size. The term (N-n)/N is called the finite population factor, which for n less than about 10% of N, can safety be ignored. However, since s 2 is usually unknown. it is estimated by the sample variance ? = - - = n i i x x n s 1 2 2 ) ( 1 1 --------------------------------(3) in which the denominator is taken as n-1 rather than n. reflecting the loss of a degree- of- freedom due to estimating the mean from the same data. The estimator is unbiased but does not have minimum variance. The only choice (i.e. allocation) to be made in simple random sampling is the sample size n. Since the sampling variance of the mean is inversely proportional to sample size. ( ) 1 - ? n x Var , a given estimator precision can be obtained by adjusting the sample size, if s is known or assumed. A sampling plan can be optimized for total cost by assuming some relationship between () x Var and cost in 2 construction or design. A common assumption is that this cost is proportional to the square root of the variance, usually called the standard error of the mean, ( ) x Var x 2 1 = s . It is usually assumed that the estimates of y and Y are normally distributed about the corresponding population values. If the assumption holds, lower and upper confidence limits for the population mean and total mean are as follows: Mean: 1 L ts Yy f n =- - , 1 U ts Yy f n =+- Total: 1 L tNs YNy f n =- - , 1 U tNs YNy f n = +- The symbol t is the value of the normal deviate corresponding to the desired confidence probability. The most common values are tabulated below: Confidence probability (%) 50 80 90 95 99 Normal deviate, t 0.67 1.28 1.641.96 2.58 If the sample size is less than 60, the percentage points may be taken from Student’s t table with (n-1) degrees of freedom, these being the degrees of freedom in the estimated s 2 . The t distribution holds exactly only if the observations y i are themselves normally distributed and N is infinite. Moderate departures from normality do not affect it greatly. For small samples with very skew distributions, special methods are needed. An example of the application is as follows. Example. In a site, the number of borehole data sheets to characterize the substrata to obtain design parameters is 676. In each borehole data, 42 entries reflecting the various characteristics 3 of soils viz. compressibility, shear strength, compaction control, permeability etc are indicated. In an audit conducted, it was revealed that in some datasheets, all the data are not entered. The audit party verified a random sample of 50 sheets ( 7% sample) and the results are indicated in Table.1 Table 21 Results for a sample of 50 petition sheets Number of signatures, y i Frequency, f i 42 41 36 32 29 27 23 19 16 15 14 11 10 9 7 6 5 4 3 23 4 1 1 1 2 1 1 2 2 1 1 1 1 1 3 2 1 1 ? f i 50 We find n = ? f i = 50, y = ? f i y i = 1471, ? f i y i 2 = 54,497 Hence the estimated total number of signatures is ( ) ( ) 676 1471 19,888 50 YNy == = For the sample variance s 2 we have 2 22 2 () 11 [( )] 11 ii ii i i i fy sfyy fy nn f ? ? =-= - ? ? -- ? ? ? ? ? ?? ? 4 Page 5 4. Sampling 4.1. Concepts of Sampling Many variables in Civil engineering are spatially distributed. For example concentration of pollutants, variation of material properties such as strength and stiffness in the case of concrete and soils are spatially distributed. The purpose of sampling is to obtain estimates of population parameters (e.g. means, variances, covariance’s) to characterize the entire population distribution without observing and measuring every element in the sampled population. Sampling theory for spatial processes principally involves evaluation of estimator’s sampling distributions and confidence limits. A very good introduction to these methods and the uses and advantages of sampling is provided by Cochran (1977) and Beacher and Christian (2003). An estimate is the realization of a particular sample statistic for a specific set of sample observations. Estimates are not exact and uncertainty is reflected in the variance of their distribution about the true parameter value they estimate. This variance is, in turn, a function of both the sampling plan and the sampled population. By knowing this variance and making assumptions about the distribution, shape, confidence limits on true population parameters can be set. A sampling plan is a program of action for collecting data from a sampled population. Common plans are grouped into many types: for example, simple random, systematic, stratified random, cluster, traverse, line intersects, and so on. In deciding among plans or in designing a specific program once the type plan has been chosen, one attempts to obtain the highest precision for a fixed sampling cost or the lowest sampling cost for a fixed precision or a specified confidence interval. 4.2. Common Spatial Sampling Plans Statistical sampling is a common activity in many human enterprises, from the national census, to market research, to scientific research. As a result, common situations are encountered in many different endeavors, and a family of sampling plans has grown up to 1 handle these situations. Simple random sampling, systematic sampling, stratified random sampling, and cluster sampling are considered in the following section. 4.2.1. Simple random sampling The characteristic property of simple random sampling is that individual are chosen at random from the sampled population, and each element of population has an equal probability of being observed. An unbiased estimator of the population mean from a simple random x={x 1 ………..x n } is the sample mean --------------------------------(1) This estimator has sampling variance. ? = = n i i x n x 1 1 N n N n x Var - = 2 ) ( s --------------------------------(2) where s 2 is the (true) variance of the sampled population and N is the total sampled population size. The term (N-n)/N is called the finite population factor, which for n less than about 10% of N, can safety be ignored. However, since s 2 is usually unknown. it is estimated by the sample variance ? = - - = n i i x x n s 1 2 2 ) ( 1 1 --------------------------------(3) in which the denominator is taken as n-1 rather than n. reflecting the loss of a degree- of- freedom due to estimating the mean from the same data. The estimator is unbiased but does not have minimum variance. The only choice (i.e. allocation) to be made in simple random sampling is the sample size n. Since the sampling variance of the mean is inversely proportional to sample size. ( ) 1 - ? n x Var , a given estimator precision can be obtained by adjusting the sample size, if s is known or assumed. A sampling plan can be optimized for total cost by assuming some relationship between () x Var and cost in 2 construction or design. A common assumption is that this cost is proportional to the square root of the variance, usually called the standard error of the mean, ( ) x Var x 2 1 = s . It is usually assumed that the estimates of y and Y are normally distributed about the corresponding population values. If the assumption holds, lower and upper confidence limits for the population mean and total mean are as follows: Mean: 1 L ts Yy f n =- - , 1 U ts Yy f n =+- Total: 1 L tNs YNy f n =- - , 1 U tNs YNy f n = +- The symbol t is the value of the normal deviate corresponding to the desired confidence probability. The most common values are tabulated below: Confidence probability (%) 50 80 90 95 99 Normal deviate, t 0.67 1.28 1.641.96 2.58 If the sample size is less than 60, the percentage points may be taken from Student’s t table with (n-1) degrees of freedom, these being the degrees of freedom in the estimated s 2 . The t distribution holds exactly only if the observations y i are themselves normally distributed and N is infinite. Moderate departures from normality do not affect it greatly. For small samples with very skew distributions, special methods are needed. An example of the application is as follows. Example. In a site, the number of borehole data sheets to characterize the substrata to obtain design parameters is 676. In each borehole data, 42 entries reflecting the various characteristics 3 of soils viz. compressibility, shear strength, compaction control, permeability etc are indicated. In an audit conducted, it was revealed that in some datasheets, all the data are not entered. The audit party verified a random sample of 50 sheets ( 7% sample) and the results are indicated in Table.1 Table 21 Results for a sample of 50 petition sheets Number of signatures, y i Frequency, f i 42 41 36 32 29 27 23 19 16 15 14 11 10 9 7 6 5 4 3 23 4 1 1 1 2 1 1 2 2 1 1 1 1 1 3 2 1 1 ? f i 50 We find n = ? f i = 50, y = ? f i y i = 1471, ? f i y i 2 = 54,497 Hence the estimated total number of signatures is ( ) ( ) 676 1471 19,888 50 YNy == = For the sample variance s 2 we have 2 22 2 () 11 [( )] 11 ii ii i i i fy sfyy fy nn f ? ? =-= - ? ? -- ? ? ? ? ? ?? ? 4 2 1 (1471) 54,497 229.0 49 50 ?? =- = ?? ?? The 80% confidence limits are given by ( ) ( ) ( ) 1.28 676 15.13 1 0.0740 19,888 1 19,888 50 tNs f n - ±-= ± This gives 18,107 and 21,669 for the 80 % limits. A complete count showed 21,045 entries and is close to the upper estimate. 4.2.2. Systematic sampling In systematic sampling the first observation is chosen at random and subsequent observations are chosen periodically throughout the population. To select a sample of n units, we take a unit at random from the first k units and every k th unit thereafter. The method involves the selection of every k th element from a sampling frame, where k, the sampling interval, is calculated as: k = population size (N) / sample size (n) Using this procedure each element in the population has a known and equal probability of selection. This makes systematic sampling functionally similar to simple random sampling. It is however, much more efficient (if variance within systematic sample is more than variance of population) and much less expensive to carry out. The advantages of this approach are that 1) the mistakes in sampling are minimized and the operation is speedy, 2) it is spread uniformly over the population and is likely to be more precise than the random sampling. An unbiased estimate of the mean from, a systematic sample is the same as above equation .The sampling variance of this estimate is 5Read More

- Test: Sampling Theory - 4
Test | 40 questions | 40 min

- Test: Sampling Theory - 2
Test | 40 questions | 40 min

- Test: Sampling Theory - 3
Test | 40 questions | 40 min

- Test: Sampling Theory - 5
Test | 40 questions | 40 min

- Test: Sampling Theory - 1
Test | 40 questions | 40 min