Courses

# Chapter 4 Sampling - Notes, Civil Engineering, Semester Notes | EduRev

Created by: Renu Garg

## : Chapter 4 Sampling - Notes, Civil Engineering, Semester Notes | EduRev

``` Page 1

4. Sampling
4.1. Concepts of Sampling
Many variables in Civil engineering are spatially distributed. For example concentration
of pollutants, variation of material properties such as strength and stiffness in the case of
concrete and soils are spatially distributed. The purpose of sampling is to obtain estimates
of population parameters (e.g. means, variances, covarianceâ€™s) to characterize the entire
population distribution without observing and measuring every element in the sampled
population. Sampling theory for spatial processes principally involves evaluation of
estimatorâ€™s sampling distributions and confidence limits. A very good introduction to
these methods and the uses and advantages of sampling is provided by Cochran (1977)
and Beacher and Christian (2003).
An estimate is the realization of a particular sample statistic for a specific set of sample
observations. Estimates are not exact and uncertainty is reflected in the variance of their
distribution about the true parameter value they estimate. This variance is, in turn, a
function of both the sampling plan and the sampled population. By knowing this variance
and making assumptions about the distribution, shape, confidence limits on true
population parameters can be set.
A sampling plan is a program of action for collecting data from a sampled population.
Common plans are grouped into many types: for example, simple random, systematic,
stratified random, cluster, traverse, line intersects, and so on. In deciding among plans or
in designing a specific program once the type plan has been chosen, one attempts to
obtain the highest precision for a fixed sampling cost or the lowest sampling cost for a
fixed precision or a specified confidence interval.
4.2. Common Spatial Sampling Plans
Statistical sampling is a common activity in many human enterprises, from the national
census, to market research, to scientific research. As a result, common situations are
encountered in many different endeavors, and a family of sampling plans has grown up to
1
Page 2

4. Sampling
4.1. Concepts of Sampling
Many variables in Civil engineering are spatially distributed. For example concentration
of pollutants, variation of material properties such as strength and stiffness in the case of
concrete and soils are spatially distributed. The purpose of sampling is to obtain estimates
of population parameters (e.g. means, variances, covarianceâ€™s) to characterize the entire
population distribution without observing and measuring every element in the sampled
population. Sampling theory for spatial processes principally involves evaluation of
estimatorâ€™s sampling distributions and confidence limits. A very good introduction to
these methods and the uses and advantages of sampling is provided by Cochran (1977)
and Beacher and Christian (2003).
An estimate is the realization of a particular sample statistic for a specific set of sample
observations. Estimates are not exact and uncertainty is reflected in the variance of their
distribution about the true parameter value they estimate. This variance is, in turn, a
function of both the sampling plan and the sampled population. By knowing this variance
and making assumptions about the distribution, shape, confidence limits on true
population parameters can be set.
A sampling plan is a program of action for collecting data from a sampled population.
Common plans are grouped into many types: for example, simple random, systematic,
stratified random, cluster, traverse, line intersects, and so on. In deciding among plans or
in designing a specific program once the type plan has been chosen, one attempts to
obtain the highest precision for a fixed sampling cost or the lowest sampling cost for a
fixed precision or a specified confidence interval.
4.2. Common Spatial Sampling Plans
Statistical sampling is a common activity in many human enterprises, from the national
census, to market research, to scientific research. As a result, common situations are
encountered in many different endeavors, and a family of sampling plans has grown up to
1
handle these situations. Simple random sampling, systematic sampling, stratified random
sampling, and cluster sampling are considered in the following section.
4.2.1. Simple random sampling
The characteristic property of simple random sampling is that individual are chosen at
random from the sampled population, and each element of population has an equal
probability of being observed. An unbiased estimator of the population mean from a
simple random x={x
1
â€¦â€¦â€¦..x
n
} is the sample mean
--------------------------------(1)
This estimator has sampling variance.
?
=
=
n
i
i
x
n
x
1
1
N
n N
n
x Var
-
=
2
) (
s
--------------------------------(2)
where s
2
is the (true) variance of the sampled population and N is the total sampled
population size. The term (N-n)/N is called the finite population factor, which for n less
than about 10% of N, can safety be ignored. However, since s
2
is usually unknown. it is
estimated by the sample variance
?
=
-
-
=
n
i
i
x x
n
s
1
2 2
) (
1
1
--------------------------------(3)
in which the denominator is taken as n-1 rather than n. reflecting the loss of a degree- of-
freedom due to estimating the mean from the same data. The estimator is unbiased but
does not have minimum variance. The only choice (i.e. allocation) to be made in simple
random sampling is the sample size n. Since the sampling variance of the mean is
inversely proportional to sample size. ( )
1 -
? n x Var , a given estimator precision can be
obtained by adjusting the sample size, if s is known or assumed. A sampling plan can be
optimized for total cost by assuming some relationship between () x Var and cost in
2
Page 3

4. Sampling
4.1. Concepts of Sampling
Many variables in Civil engineering are spatially distributed. For example concentration
of pollutants, variation of material properties such as strength and stiffness in the case of
concrete and soils are spatially distributed. The purpose of sampling is to obtain estimates
of population parameters (e.g. means, variances, covarianceâ€™s) to characterize the entire
population distribution without observing and measuring every element in the sampled
population. Sampling theory for spatial processes principally involves evaluation of
estimatorâ€™s sampling distributions and confidence limits. A very good introduction to
these methods and the uses and advantages of sampling is provided by Cochran (1977)
and Beacher and Christian (2003).
An estimate is the realization of a particular sample statistic for a specific set of sample
observations. Estimates are not exact and uncertainty is reflected in the variance of their
distribution about the true parameter value they estimate. This variance is, in turn, a
function of both the sampling plan and the sampled population. By knowing this variance
and making assumptions about the distribution, shape, confidence limits on true
population parameters can be set.
A sampling plan is a program of action for collecting data from a sampled population.
Common plans are grouped into many types: for example, simple random, systematic,
stratified random, cluster, traverse, line intersects, and so on. In deciding among plans or
in designing a specific program once the type plan has been chosen, one attempts to
obtain the highest precision for a fixed sampling cost or the lowest sampling cost for a
fixed precision or a specified confidence interval.
4.2. Common Spatial Sampling Plans
Statistical sampling is a common activity in many human enterprises, from the national
census, to market research, to scientific research. As a result, common situations are
encountered in many different endeavors, and a family of sampling plans has grown up to
1
handle these situations. Simple random sampling, systematic sampling, stratified random
sampling, and cluster sampling are considered in the following section.
4.2.1. Simple random sampling
The characteristic property of simple random sampling is that individual are chosen at
random from the sampled population, and each element of population has an equal
probability of being observed. An unbiased estimator of the population mean from a
simple random x={x
1
â€¦â€¦â€¦..x
n
} is the sample mean
--------------------------------(1)
This estimator has sampling variance.
?
=
=
n
i
i
x
n
x
1
1
N
n N
n
x Var
-
=
2
) (
s
--------------------------------(2)
where s
2
is the (true) variance of the sampled population and N is the total sampled
population size. The term (N-n)/N is called the finite population factor, which for n less
than about 10% of N, can safety be ignored. However, since s
2
is usually unknown. it is
estimated by the sample variance
?
=
-
-
=
n
i
i
x x
n
s
1
2 2
) (
1
1
--------------------------------(3)
in which the denominator is taken as n-1 rather than n. reflecting the loss of a degree- of-
freedom due to estimating the mean from the same data. The estimator is unbiased but
does not have minimum variance. The only choice (i.e. allocation) to be made in simple
random sampling is the sample size n. Since the sampling variance of the mean is
inversely proportional to sample size. ( )
1 -
? n x Var , a given estimator precision can be
obtained by adjusting the sample size, if s is known or assumed. A sampling plan can be
optimized for total cost by assuming some relationship between () x Var and cost in
2
construction or design. A common assumption is that this cost is proportional to the
square root of the variance, usually called the standard error of the mean, ( ) x Var
x
2 1
= s .
It is usually assumed that the estimates of y and Y are normally distributed about the
corresponding population values. If the assumption holds, lower and upper confidence
limits for the population mean and total mean are as follows:

Mean:
1
L
ts
Yy f
n
=- -
,
1
U
ts
Yy f
n
=+-

Total:
1
L
tNs
YNy f
n
=- -
,
1
U
tNs
YNy f
n
= +-

The symbol t is the value of the normal deviate corresponding to the desired confidence
probability. The most common values are tabulated below:

Confidence probability (%) 50 80 90 95 99
Normal deviate, t 0.67 1.28 1.641.96 2.58

If the sample size is less than 60, the percentage points may be taken from Studentâ€™s
t table with (n-1) degrees of freedom, these being the degrees of freedom in the estimated
s
2
. The t distribution holds exactly only if the observations y
i
are themselves normally
distributed and N is infinite. Moderate departures from normality do not affect it greatly.
For small samples with very skew distributions, special methods are needed. An example
of the application is as follows.
Example.
In a site, the number of borehole data sheets to characterize the substrata to obtain design
parameters is 676. In each borehole data, 42 entries reflecting the various characteristics
3
Page 4

4. Sampling
4.1. Concepts of Sampling
Many variables in Civil engineering are spatially distributed. For example concentration
of pollutants, variation of material properties such as strength and stiffness in the case of
concrete and soils are spatially distributed. The purpose of sampling is to obtain estimates
of population parameters (e.g. means, variances, covarianceâ€™s) to characterize the entire
population distribution without observing and measuring every element in the sampled
population. Sampling theory for spatial processes principally involves evaluation of
estimatorâ€™s sampling distributions and confidence limits. A very good introduction to
these methods and the uses and advantages of sampling is provided by Cochran (1977)
and Beacher and Christian (2003).
An estimate is the realization of a particular sample statistic for a specific set of sample
observations. Estimates are not exact and uncertainty is reflected in the variance of their
distribution about the true parameter value they estimate. This variance is, in turn, a
function of both the sampling plan and the sampled population. By knowing this variance
and making assumptions about the distribution, shape, confidence limits on true
population parameters can be set.
A sampling plan is a program of action for collecting data from a sampled population.
Common plans are grouped into many types: for example, simple random, systematic,
stratified random, cluster, traverse, line intersects, and so on. In deciding among plans or
in designing a specific program once the type plan has been chosen, one attempts to
obtain the highest precision for a fixed sampling cost or the lowest sampling cost for a
fixed precision or a specified confidence interval.
4.2. Common Spatial Sampling Plans
Statistical sampling is a common activity in many human enterprises, from the national
census, to market research, to scientific research. As a result, common situations are
encountered in many different endeavors, and a family of sampling plans has grown up to
1
handle these situations. Simple random sampling, systematic sampling, stratified random
sampling, and cluster sampling are considered in the following section.
4.2.1. Simple random sampling
The characteristic property of simple random sampling is that individual are chosen at
random from the sampled population, and each element of population has an equal
probability of being observed. An unbiased estimator of the population mean from a
simple random x={x
1
â€¦â€¦â€¦..x
n
} is the sample mean
--------------------------------(1)
This estimator has sampling variance.
?
=
=
n
i
i
x
n
x
1
1
N
n N
n
x Var
-
=
2
) (
s
--------------------------------(2)
where s
2
is the (true) variance of the sampled population and N is the total sampled
population size. The term (N-n)/N is called the finite population factor, which for n less
than about 10% of N, can safety be ignored. However, since s
2
is usually unknown. it is
estimated by the sample variance
?
=
-
-
=
n
i
i
x x
n
s
1
2 2
) (
1
1
--------------------------------(3)
in which the denominator is taken as n-1 rather than n. reflecting the loss of a degree- of-
freedom due to estimating the mean from the same data. The estimator is unbiased but
does not have minimum variance. The only choice (i.e. allocation) to be made in simple
random sampling is the sample size n. Since the sampling variance of the mean is
inversely proportional to sample size. ( )
1 -
? n x Var , a given estimator precision can be
obtained by adjusting the sample size, if s is known or assumed. A sampling plan can be
optimized for total cost by assuming some relationship between () x Var and cost in
2
construction or design. A common assumption is that this cost is proportional to the
square root of the variance, usually called the standard error of the mean, ( ) x Var
x
2 1
= s .
It is usually assumed that the estimates of y and Y are normally distributed about the
corresponding population values. If the assumption holds, lower and upper confidence
limits for the population mean and total mean are as follows:

Mean:
1
L
ts
Yy f
n
=- -
,
1
U
ts
Yy f
n
=+-

Total:
1
L
tNs
YNy f
n
=- -
,
1
U
tNs
YNy f
n
= +-

The symbol t is the value of the normal deviate corresponding to the desired confidence
probability. The most common values are tabulated below:

Confidence probability (%) 50 80 90 95 99
Normal deviate, t 0.67 1.28 1.641.96 2.58

If the sample size is less than 60, the percentage points may be taken from Studentâ€™s
t table with (n-1) degrees of freedom, these being the degrees of freedom in the estimated
s
2
. The t distribution holds exactly only if the observations y
i
are themselves normally
distributed and N is infinite. Moderate departures from normality do not affect it greatly.
For small samples with very skew distributions, special methods are needed. An example
of the application is as follows.
Example.
In a site, the number of borehole data sheets to characterize the substrata to obtain design
parameters is 676. In each borehole data, 42 entries reflecting the various characteristics
3
of soils viz. compressibility, shear strength, compaction control, permeability etc are
indicated. In an audit conducted, it was revealed that in some datasheets, all the data are
not entered. The audit party verified a random sample of 50 sheets ( 7% sample) and the
results are indicated in Table.1
Table 21 Results for a sample of 50 petition sheets
Number of signatures, y
i
Frequency, f
i
42
41
36
32
29
27
23
19
16
15
14
11
10
9
7
6
5
4
3
23
4
1
1
1
2
1
1
2
2
1
1
1
1
1
3
2
1
1
?  f
i
50

We find
n = ?  f
i
= 50,  y = ?  f
i
y
i
= 1471,  ?  f
i
y
i
2
= 54,497

Hence  the estimated total number of signatures is

( ) ( ) 676 1471
19,888
50
YNy == =

For the sample variance s
2
we have
2
22 2
()
11
[( )]
11
ii
ii i i
i
fy
sfyy fy
nn f
? ?
=-= -
? ?
--
? ?
? ?
?
??
?

4
Page 5

4. Sampling
4.1. Concepts of Sampling
Many variables in Civil engineering are spatially distributed. For example concentration
of pollutants, variation of material properties such as strength and stiffness in the case of
concrete and soils are spatially distributed. The purpose of sampling is to obtain estimates
of population parameters (e.g. means, variances, covarianceâ€™s) to characterize the entire
population distribution without observing and measuring every element in the sampled
population. Sampling theory for spatial processes principally involves evaluation of
estimatorâ€™s sampling distributions and confidence limits. A very good introduction to
these methods and the uses and advantages of sampling is provided by Cochran (1977)
and Beacher and Christian (2003).
An estimate is the realization of a particular sample statistic for a specific set of sample
observations. Estimates are not exact and uncertainty is reflected in the variance of their
distribution about the true parameter value they estimate. This variance is, in turn, a
function of both the sampling plan and the sampled population. By knowing this variance
and making assumptions about the distribution, shape, confidence limits on true
population parameters can be set.
A sampling plan is a program of action for collecting data from a sampled population.
Common plans are grouped into many types: for example, simple random, systematic,
stratified random, cluster, traverse, line intersects, and so on. In deciding among plans or
in designing a specific program once the type plan has been chosen, one attempts to
obtain the highest precision for a fixed sampling cost or the lowest sampling cost for a
fixed precision or a specified confidence interval.
4.2. Common Spatial Sampling Plans
Statistical sampling is a common activity in many human enterprises, from the national
census, to market research, to scientific research. As a result, common situations are
encountered in many different endeavors, and a family of sampling plans has grown up to
1
handle these situations. Simple random sampling, systematic sampling, stratified random
sampling, and cluster sampling are considered in the following section.
4.2.1. Simple random sampling
The characteristic property of simple random sampling is that individual are chosen at
random from the sampled population, and each element of population has an equal
probability of being observed. An unbiased estimator of the population mean from a
simple random x={x
1
â€¦â€¦â€¦..x
n
} is the sample mean
--------------------------------(1)
This estimator has sampling variance.
?
=
=
n
i
i
x
n
x
1
1
N
n N
n
x Var
-
=
2
) (
s
--------------------------------(2)
where s
2
is the (true) variance of the sampled population and N is the total sampled
population size. The term (N-n)/N is called the finite population factor, which for n less
than about 10% of N, can safety be ignored. However, since s
2
is usually unknown. it is
estimated by the sample variance
?
=
-
-
=
n
i
i
x x
n
s
1
2 2
) (
1
1
--------------------------------(3)
in which the denominator is taken as n-1 rather than n. reflecting the loss of a degree- of-
freedom due to estimating the mean from the same data. The estimator is unbiased but
does not have minimum variance. The only choice (i.e. allocation) to be made in simple
random sampling is the sample size n. Since the sampling variance of the mean is
inversely proportional to sample size. ( )
1 -
? n x Var , a given estimator precision can be
obtained by adjusting the sample size, if s is known or assumed. A sampling plan can be
optimized for total cost by assuming some relationship between () x Var and cost in
2
construction or design. A common assumption is that this cost is proportional to the
square root of the variance, usually called the standard error of the mean, ( ) x Var
x
2 1
= s .
It is usually assumed that the estimates of y and Y are normally distributed about the
corresponding population values. If the assumption holds, lower and upper confidence
limits for the population mean and total mean are as follows:

Mean:
1
L
ts
Yy f
n
=- -
,
1
U
ts
Yy f
n
=+-

Total:
1
L
tNs
YNy f
n
=- -
,
1
U
tNs
YNy f
n
= +-

The symbol t is the value of the normal deviate corresponding to the desired confidence
probability. The most common values are tabulated below:

Confidence probability (%) 50 80 90 95 99
Normal deviate, t 0.67 1.28 1.641.96 2.58

If the sample size is less than 60, the percentage points may be taken from Studentâ€™s
t table with (n-1) degrees of freedom, these being the degrees of freedom in the estimated
s
2
. The t distribution holds exactly only if the observations y
i
are themselves normally
distributed and N is infinite. Moderate departures from normality do not affect it greatly.
For small samples with very skew distributions, special methods are needed. An example
of the application is as follows.
Example.
In a site, the number of borehole data sheets to characterize the substrata to obtain design
parameters is 676. In each borehole data, 42 entries reflecting the various characteristics
3
of soils viz. compressibility, shear strength, compaction control, permeability etc are
indicated. In an audit conducted, it was revealed that in some datasheets, all the data are
not entered. The audit party verified a random sample of 50 sheets ( 7% sample) and the
results are indicated in Table.1
Table 21 Results for a sample of 50 petition sheets
Number of signatures, y
i
Frequency, f
i
42
41
36
32
29
27
23
19
16
15
14
11
10
9
7
6
5
4
3
23
4
1
1
1
2
1
1
2
2
1
1
1
1
1
3
2
1
1
?  f
i
50

We find
n = ?  f
i
= 50,  y = ?  f
i
y
i
= 1471,  ?  f
i
y
i
2
= 54,497

Hence  the estimated total number of signatures is

( ) ( ) 676 1471
19,888
50
YNy == =

For the sample variance s
2
we have
2
22 2
()
11
[( )]
11
ii
ii i i
i
fy
sfyy fy
nn f
? ?
=-= -
? ?
--
? ?
? ?
?
??
?

4
2
1 (1471)
54,497 229.0
49 50
??
=- =
??
??

The 80% confidence limits are given by

( ) ( ) ( ) 1.28 676 15.13 1 0.0740
19,888 1 19,888
50
tNs
f
n
-
±-= ±

This gives 18,107 and 21,669 for the 80 % limits. A complete count showed 21,045
entries and is close to the upper estimate.

4.2.2. Systematic sampling
In systematic sampling the first observation is chosen at random and subsequent
observations are chosen periodically throughout the population. To select a sample of n
units, we take a unit at random from the first k units  and every k
th
unit thereafter.  The
method involves the selection of every k
th
element from a sampling frame, where k, the
sampling interval, is calculated as:
k = population size (N) / sample size (n)
Using this procedure each element in the population has a known and equal probability of
selection. This makes systematic sampling functionally similar to simple random
sampling. It is however, much more efficient (if variance within systematic sample is
more than variance of population) and much less expensive to carry out. The advantages
of this approach are that 1) the mistakes in sampling are minimized and the operation is
speedy, 2) it is spread uniformly over the population and is likely to be more precise than
the random sampling.
An unbiased estimate of the mean from, a systematic sample is the same as above
equation .The sampling variance of this estimate is
5
```
Offer running on EduRev: Apply code STAYHOME200 to get INR 200 off on our premium plan EduRev Infinity!