Chapter 4 Sampling - Notes, Civil Engineering, Semester Notes | EduRev

Created by: Renu Garg

: Chapter 4 Sampling - Notes, Civil Engineering, Semester Notes | EduRev

 Page 1


4. Sampling 
4.1. Concepts of Sampling  
Many variables in Civil engineering are spatially distributed. For example concentration 
of pollutants, variation of material properties such as strength and stiffness in the case of 
concrete and soils are spatially distributed. The purpose of sampling is to obtain estimates 
of population parameters (e.g. means, variances, covariance’s) to characterize the entire 
population distribution without observing and measuring every element in the sampled 
population. Sampling theory for spatial processes principally involves evaluation of 
estimator’s sampling distributions and confidence limits. A very good introduction to 
these methods and the uses and advantages of sampling is provided by Cochran (1977) 
and Beacher and Christian (2003).   
An estimate is the realization of a particular sample statistic for a specific set of sample 
observations. Estimates are not exact and uncertainty is reflected in the variance of their 
distribution about the true parameter value they estimate. This variance is, in turn, a 
function of both the sampling plan and the sampled population. By knowing this variance 
and making assumptions about the distribution, shape, confidence limits on true 
population parameters can be set.  
A sampling plan is a program of action for collecting data from a sampled population. 
Common plans are grouped into many types: for example, simple random, systematic, 
stratified random, cluster, traverse, line intersects, and so on. In deciding among plans or 
in designing a specific program once the type plan has been chosen, one attempts to 
obtain the highest precision for a fixed sampling cost or the lowest sampling cost for a 
fixed precision or a specified confidence interval.  
4.2. Common Spatial Sampling Plans  
Statistical sampling is a common activity in many human enterprises, from the national 
census, to market research, to scientific research. As a result, common situations are 
encountered in many different endeavors, and a family of sampling plans has grown up to 
 1
Page 2


4. Sampling 
4.1. Concepts of Sampling  
Many variables in Civil engineering are spatially distributed. For example concentration 
of pollutants, variation of material properties such as strength and stiffness in the case of 
concrete and soils are spatially distributed. The purpose of sampling is to obtain estimates 
of population parameters (e.g. means, variances, covariance’s) to characterize the entire 
population distribution without observing and measuring every element in the sampled 
population. Sampling theory for spatial processes principally involves evaluation of 
estimator’s sampling distributions and confidence limits. A very good introduction to 
these methods and the uses and advantages of sampling is provided by Cochran (1977) 
and Beacher and Christian (2003).   
An estimate is the realization of a particular sample statistic for a specific set of sample 
observations. Estimates are not exact and uncertainty is reflected in the variance of their 
distribution about the true parameter value they estimate. This variance is, in turn, a 
function of both the sampling plan and the sampled population. By knowing this variance 
and making assumptions about the distribution, shape, confidence limits on true 
population parameters can be set.  
A sampling plan is a program of action for collecting data from a sampled population. 
Common plans are grouped into many types: for example, simple random, systematic, 
stratified random, cluster, traverse, line intersects, and so on. In deciding among plans or 
in designing a specific program once the type plan has been chosen, one attempts to 
obtain the highest precision for a fixed sampling cost or the lowest sampling cost for a 
fixed precision or a specified confidence interval.  
4.2. Common Spatial Sampling Plans  
Statistical sampling is a common activity in many human enterprises, from the national 
census, to market research, to scientific research. As a result, common situations are 
encountered in many different endeavors, and a family of sampling plans has grown up to 
 1
handle these situations. Simple random sampling, systematic sampling, stratified random 
sampling, and cluster sampling are considered in the following section.  
4.2.1. Simple random sampling  
The characteristic property of simple random sampling is that individual are chosen at 
random from the sampled population, and each element of population has an equal 
probability of being observed. An unbiased estimator of the population mean from a 
simple random x={x
1
………..x
n
} is the sample mean  
--------------------------------(1) 
This estimator has sampling variance.  
?
= 
= 
n 
i
i
x
n 
x
1
1 
N
n N
n
x Var
-
=
2
) (
s
--------------------------------(2) 
where s
2
 is the (true) variance of the sampled population and N is the total sampled 
population size. The term (N-n)/N is called the finite population factor, which for n less 
than about 10% of N, can safety be ignored. However, since s
2
 is usually unknown. it is 
estimated by the sample variance  
?
=
-
-
=
n
i
i
x x
n
s
1
2 2
) (
1
1
--------------------------------(3) 
in which the denominator is taken as n-1 rather than n. reflecting the loss of a degree- of-
freedom due to estimating the mean from the same data. The estimator is unbiased but 
does not have minimum variance. The only choice (i.e. allocation) to be made in simple 
random sampling is the sample size n. Since the sampling variance of the mean is 
inversely proportional to sample size. ( )
1 -
? n x Var , a given estimator precision can be 
obtained by adjusting the sample size, if s is known or assumed. A sampling plan can be 
optimized for total cost by assuming some relationship between () x Var and cost in 
 2
Page 3


4. Sampling 
4.1. Concepts of Sampling  
Many variables in Civil engineering are spatially distributed. For example concentration 
of pollutants, variation of material properties such as strength and stiffness in the case of 
concrete and soils are spatially distributed. The purpose of sampling is to obtain estimates 
of population parameters (e.g. means, variances, covariance’s) to characterize the entire 
population distribution without observing and measuring every element in the sampled 
population. Sampling theory for spatial processes principally involves evaluation of 
estimator’s sampling distributions and confidence limits. A very good introduction to 
these methods and the uses and advantages of sampling is provided by Cochran (1977) 
and Beacher and Christian (2003).   
An estimate is the realization of a particular sample statistic for a specific set of sample 
observations. Estimates are not exact and uncertainty is reflected in the variance of their 
distribution about the true parameter value they estimate. This variance is, in turn, a 
function of both the sampling plan and the sampled population. By knowing this variance 
and making assumptions about the distribution, shape, confidence limits on true 
population parameters can be set.  
A sampling plan is a program of action for collecting data from a sampled population. 
Common plans are grouped into many types: for example, simple random, systematic, 
stratified random, cluster, traverse, line intersects, and so on. In deciding among plans or 
in designing a specific program once the type plan has been chosen, one attempts to 
obtain the highest precision for a fixed sampling cost or the lowest sampling cost for a 
fixed precision or a specified confidence interval.  
4.2. Common Spatial Sampling Plans  
Statistical sampling is a common activity in many human enterprises, from the national 
census, to market research, to scientific research. As a result, common situations are 
encountered in many different endeavors, and a family of sampling plans has grown up to 
 1
handle these situations. Simple random sampling, systematic sampling, stratified random 
sampling, and cluster sampling are considered in the following section.  
4.2.1. Simple random sampling  
The characteristic property of simple random sampling is that individual are chosen at 
random from the sampled population, and each element of population has an equal 
probability of being observed. An unbiased estimator of the population mean from a 
simple random x={x
1
………..x
n
} is the sample mean  
--------------------------------(1) 
This estimator has sampling variance.  
?
= 
= 
n 
i
i
x
n 
x
1
1 
N
n N
n
x Var
-
=
2
) (
s
--------------------------------(2) 
where s
2
 is the (true) variance of the sampled population and N is the total sampled 
population size. The term (N-n)/N is called the finite population factor, which for n less 
than about 10% of N, can safety be ignored. However, since s
2
 is usually unknown. it is 
estimated by the sample variance  
?
=
-
-
=
n
i
i
x x
n
s
1
2 2
) (
1
1
--------------------------------(3) 
in which the denominator is taken as n-1 rather than n. reflecting the loss of a degree- of-
freedom due to estimating the mean from the same data. The estimator is unbiased but 
does not have minimum variance. The only choice (i.e. allocation) to be made in simple 
random sampling is the sample size n. Since the sampling variance of the mean is 
inversely proportional to sample size. ( )
1 -
? n x Var , a given estimator precision can be 
obtained by adjusting the sample size, if s is known or assumed. A sampling plan can be 
optimized for total cost by assuming some relationship between () x Var and cost in 
 2
construction or design. A common assumption is that this cost is proportional to the 
square root of the variance, usually called the standard error of the mean, ( ) x Var
x
2 1
= s . 
It is usually assumed that the estimates of y and Y are normally distributed about the 
corresponding population values. If the assumption holds, lower and upper confidence 
limits for the population mean and total mean are as follows: 
 
Mean: 
1
L
ts
Yy f
n
=- -
 ,  
1
U
ts
Yy f
n
=+-
                 
Total: 
1
L
tNs
YNy f
n
=- -
,  
1
U
tNs
YNy f
n
= +-
     
 
The symbol t is the value of the normal deviate corresponding to the desired confidence 
probability. The most common values are tabulated below: 
 
Confidence probability (%) 50 80 90 95 99 
Normal deviate, t 0.67 1.28 1.641.96 2.58 
 
If the sample size is less than 60, the percentage points may be taken from Student’s         
t table with (n-1) degrees of freedom, these being the degrees of freedom in the estimated 
s
2
. The t distribution holds exactly only if the observations y
i
 are themselves normally 
distributed and N is infinite. Moderate departures from normality do not affect it greatly. 
For small samples with very skew distributions, special methods are needed. An example 
of the application is as follows. 
Example.  
In a site, the number of borehole data sheets to characterize the substrata to obtain design 
parameters is 676. In each borehole data, 42 entries reflecting the various characteristics 
 3
Page 4


4. Sampling 
4.1. Concepts of Sampling  
Many variables in Civil engineering are spatially distributed. For example concentration 
of pollutants, variation of material properties such as strength and stiffness in the case of 
concrete and soils are spatially distributed. The purpose of sampling is to obtain estimates 
of population parameters (e.g. means, variances, covariance’s) to characterize the entire 
population distribution without observing and measuring every element in the sampled 
population. Sampling theory for spatial processes principally involves evaluation of 
estimator’s sampling distributions and confidence limits. A very good introduction to 
these methods and the uses and advantages of sampling is provided by Cochran (1977) 
and Beacher and Christian (2003).   
An estimate is the realization of a particular sample statistic for a specific set of sample 
observations. Estimates are not exact and uncertainty is reflected in the variance of their 
distribution about the true parameter value they estimate. This variance is, in turn, a 
function of both the sampling plan and the sampled population. By knowing this variance 
and making assumptions about the distribution, shape, confidence limits on true 
population parameters can be set.  
A sampling plan is a program of action for collecting data from a sampled population. 
Common plans are grouped into many types: for example, simple random, systematic, 
stratified random, cluster, traverse, line intersects, and so on. In deciding among plans or 
in designing a specific program once the type plan has been chosen, one attempts to 
obtain the highest precision for a fixed sampling cost or the lowest sampling cost for a 
fixed precision or a specified confidence interval.  
4.2. Common Spatial Sampling Plans  
Statistical sampling is a common activity in many human enterprises, from the national 
census, to market research, to scientific research. As a result, common situations are 
encountered in many different endeavors, and a family of sampling plans has grown up to 
 1
handle these situations. Simple random sampling, systematic sampling, stratified random 
sampling, and cluster sampling are considered in the following section.  
4.2.1. Simple random sampling  
The characteristic property of simple random sampling is that individual are chosen at 
random from the sampled population, and each element of population has an equal 
probability of being observed. An unbiased estimator of the population mean from a 
simple random x={x
1
………..x
n
} is the sample mean  
--------------------------------(1) 
This estimator has sampling variance.  
?
= 
= 
n 
i
i
x
n 
x
1
1 
N
n N
n
x Var
-
=
2
) (
s
--------------------------------(2) 
where s
2
 is the (true) variance of the sampled population and N is the total sampled 
population size. The term (N-n)/N is called the finite population factor, which for n less 
than about 10% of N, can safety be ignored. However, since s
2
 is usually unknown. it is 
estimated by the sample variance  
?
=
-
-
=
n
i
i
x x
n
s
1
2 2
) (
1
1
--------------------------------(3) 
in which the denominator is taken as n-1 rather than n. reflecting the loss of a degree- of-
freedom due to estimating the mean from the same data. The estimator is unbiased but 
does not have minimum variance. The only choice (i.e. allocation) to be made in simple 
random sampling is the sample size n. Since the sampling variance of the mean is 
inversely proportional to sample size. ( )
1 -
? n x Var , a given estimator precision can be 
obtained by adjusting the sample size, if s is known or assumed. A sampling plan can be 
optimized for total cost by assuming some relationship between () x Var and cost in 
 2
construction or design. A common assumption is that this cost is proportional to the 
square root of the variance, usually called the standard error of the mean, ( ) x Var
x
2 1
= s . 
It is usually assumed that the estimates of y and Y are normally distributed about the 
corresponding population values. If the assumption holds, lower and upper confidence 
limits for the population mean and total mean are as follows: 
 
Mean: 
1
L
ts
Yy f
n
=- -
 ,  
1
U
ts
Yy f
n
=+-
                 
Total: 
1
L
tNs
YNy f
n
=- -
,  
1
U
tNs
YNy f
n
= +-
     
 
The symbol t is the value of the normal deviate corresponding to the desired confidence 
probability. The most common values are tabulated below: 
 
Confidence probability (%) 50 80 90 95 99 
Normal deviate, t 0.67 1.28 1.641.96 2.58 
 
If the sample size is less than 60, the percentage points may be taken from Student’s         
t table with (n-1) degrees of freedom, these being the degrees of freedom in the estimated 
s
2
. The t distribution holds exactly only if the observations y
i
 are themselves normally 
distributed and N is infinite. Moderate departures from normality do not affect it greatly. 
For small samples with very skew distributions, special methods are needed. An example 
of the application is as follows. 
Example.  
In a site, the number of borehole data sheets to characterize the substrata to obtain design 
parameters is 676. In each borehole data, 42 entries reflecting the various characteristics 
 3
of soils viz. compressibility, shear strength, compaction control, permeability etc are 
indicated. In an audit conducted, it was revealed that in some datasheets, all the data are 
not entered. The audit party verified a random sample of 50 sheets ( 7% sample) and the 
results are indicated in Table.1 
Table 21 Results for a sample of 50 petition sheets 
Number of signatures, y
i
Frequency, f
i
42 
41 
36 
32 
29 
27 
23 
19 
16 
15 
14 
11 
10 
9 
7 
6 
5 
4 
3 
23 
4 
1 
1 
1 
2 
1 
1 
2 
2 
1 
1 
1 
1 
1 
3 
2 
1 
1 
?  f
i
50 
 
 
We find 
n = ?  f
i
 = 50,  y = ?  f
i
 y
i
 = 1471,  ?  f
i
 y
i 
2
 = 54,497  
 
Hence  the estimated total number of signatures is  
 
    
( ) ( ) 676 1471
19,888
50
YNy == = 
 
For the sample variance s
2
 we have 
2
22 2
()
11
[( )]
11
ii
ii i i
i
fy
sfyy fy
nn f
? ?
=-= -
? ?
--
? ?
? ?
?
??
?
 
 
 4
Page 5


4. Sampling 
4.1. Concepts of Sampling  
Many variables in Civil engineering are spatially distributed. For example concentration 
of pollutants, variation of material properties such as strength and stiffness in the case of 
concrete and soils are spatially distributed. The purpose of sampling is to obtain estimates 
of population parameters (e.g. means, variances, covariance’s) to characterize the entire 
population distribution without observing and measuring every element in the sampled 
population. Sampling theory for spatial processes principally involves evaluation of 
estimator’s sampling distributions and confidence limits. A very good introduction to 
these methods and the uses and advantages of sampling is provided by Cochran (1977) 
and Beacher and Christian (2003).   
An estimate is the realization of a particular sample statistic for a specific set of sample 
observations. Estimates are not exact and uncertainty is reflected in the variance of their 
distribution about the true parameter value they estimate. This variance is, in turn, a 
function of both the sampling plan and the sampled population. By knowing this variance 
and making assumptions about the distribution, shape, confidence limits on true 
population parameters can be set.  
A sampling plan is a program of action for collecting data from a sampled population. 
Common plans are grouped into many types: for example, simple random, systematic, 
stratified random, cluster, traverse, line intersects, and so on. In deciding among plans or 
in designing a specific program once the type plan has been chosen, one attempts to 
obtain the highest precision for a fixed sampling cost or the lowest sampling cost for a 
fixed precision or a specified confidence interval.  
4.2. Common Spatial Sampling Plans  
Statistical sampling is a common activity in many human enterprises, from the national 
census, to market research, to scientific research. As a result, common situations are 
encountered in many different endeavors, and a family of sampling plans has grown up to 
 1
handle these situations. Simple random sampling, systematic sampling, stratified random 
sampling, and cluster sampling are considered in the following section.  
4.2.1. Simple random sampling  
The characteristic property of simple random sampling is that individual are chosen at 
random from the sampled population, and each element of population has an equal 
probability of being observed. An unbiased estimator of the population mean from a 
simple random x={x
1
………..x
n
} is the sample mean  
--------------------------------(1) 
This estimator has sampling variance.  
?
= 
= 
n 
i
i
x
n 
x
1
1 
N
n N
n
x Var
-
=
2
) (
s
--------------------------------(2) 
where s
2
 is the (true) variance of the sampled population and N is the total sampled 
population size. The term (N-n)/N is called the finite population factor, which for n less 
than about 10% of N, can safety be ignored. However, since s
2
 is usually unknown. it is 
estimated by the sample variance  
?
=
-
-
=
n
i
i
x x
n
s
1
2 2
) (
1
1
--------------------------------(3) 
in which the denominator is taken as n-1 rather than n. reflecting the loss of a degree- of-
freedom due to estimating the mean from the same data. The estimator is unbiased but 
does not have minimum variance. The only choice (i.e. allocation) to be made in simple 
random sampling is the sample size n. Since the sampling variance of the mean is 
inversely proportional to sample size. ( )
1 -
? n x Var , a given estimator precision can be 
obtained by adjusting the sample size, if s is known or assumed. A sampling plan can be 
optimized for total cost by assuming some relationship between () x Var and cost in 
 2
construction or design. A common assumption is that this cost is proportional to the 
square root of the variance, usually called the standard error of the mean, ( ) x Var
x
2 1
= s . 
It is usually assumed that the estimates of y and Y are normally distributed about the 
corresponding population values. If the assumption holds, lower and upper confidence 
limits for the population mean and total mean are as follows: 
 
Mean: 
1
L
ts
Yy f
n
=- -
 ,  
1
U
ts
Yy f
n
=+-
                 
Total: 
1
L
tNs
YNy f
n
=- -
,  
1
U
tNs
YNy f
n
= +-
     
 
The symbol t is the value of the normal deviate corresponding to the desired confidence 
probability. The most common values are tabulated below: 
 
Confidence probability (%) 50 80 90 95 99 
Normal deviate, t 0.67 1.28 1.641.96 2.58 
 
If the sample size is less than 60, the percentage points may be taken from Student’s         
t table with (n-1) degrees of freedom, these being the degrees of freedom in the estimated 
s
2
. The t distribution holds exactly only if the observations y
i
 are themselves normally 
distributed and N is infinite. Moderate departures from normality do not affect it greatly. 
For small samples with very skew distributions, special methods are needed. An example 
of the application is as follows. 
Example.  
In a site, the number of borehole data sheets to characterize the substrata to obtain design 
parameters is 676. In each borehole data, 42 entries reflecting the various characteristics 
 3
of soils viz. compressibility, shear strength, compaction control, permeability etc are 
indicated. In an audit conducted, it was revealed that in some datasheets, all the data are 
not entered. The audit party verified a random sample of 50 sheets ( 7% sample) and the 
results are indicated in Table.1 
Table 21 Results for a sample of 50 petition sheets 
Number of signatures, y
i
Frequency, f
i
42 
41 
36 
32 
29 
27 
23 
19 
16 
15 
14 
11 
10 
9 
7 
6 
5 
4 
3 
23 
4 
1 
1 
1 
2 
1 
1 
2 
2 
1 
1 
1 
1 
1 
3 
2 
1 
1 
?  f
i
50 
 
 
We find 
n = ?  f
i
 = 50,  y = ?  f
i
 y
i
 = 1471,  ?  f
i
 y
i 
2
 = 54,497  
 
Hence  the estimated total number of signatures is  
 
    
( ) ( ) 676 1471
19,888
50
YNy == = 
 
For the sample variance s
2
 we have 
2
22 2
()
11
[( )]
11
ii
ii i i
i
fy
sfyy fy
nn f
? ?
=-= -
? ?
--
? ?
? ?
?
??
?
 
 
 4
2
1 (1471)
54,497 229.0
49 50
??
=- =
??
??
 
 
The 80% confidence limits are given by 
 
( ) ( ) ( ) 1.28 676 15.13 1 0.0740
19,888 1 19,888
50
tNs
f
n
-
±-= ± 
 
 
This gives 18,107 and 21,669 for the 80 % limits. A complete count showed 21,045 
entries and is close to the upper estimate.  
 
4.2.2. Systematic sampling 
In systematic sampling the first observation is chosen at random and subsequent 
observations are chosen periodically throughout the population. To select a sample of n 
units, we take a unit at random from the first k units  and every k
th
 unit thereafter.  The 
method involves the selection of every k
th
 element from a sampling frame, where k, the 
sampling interval, is calculated as: 
k = population size (N) / sample size (n)  
Using this procedure each element in the population has a known and equal probability of 
selection. This makes systematic sampling functionally similar to simple random 
sampling. It is however, much more efficient (if variance within systematic sample is 
more than variance of population) and much less expensive to carry out. The advantages 
of this approach are that 1) the mistakes in sampling are minimized and the operation is 
speedy, 2) it is spread uniformly over the population and is likely to be more precise than 
the random sampling.  
An unbiased estimate of the mean from, a systematic sample is the same as above 
equation .The sampling variance of this estimate is 
 5
Read More
Offer running on EduRev: Apply code STAYHOME200 to get INR 200 off on our premium plan EduRev Infinity!