CHAPTER 5: Kriging
Let us turn, now, to a much more common, and probably more realistic, approach
to the estimation of ‘local’ values. Until now we have considered only the
operation of averaging all the local samples and applying this value as the
estimate of the area under consideration. There are cases in which it would not
seem sensible to weight all the samples equally, since some will be a great
distance from the ‘unknown’ area A, whilst others will be much closer
to it --- if not inside. It would seem more sensible to use a weighted average
of the sample values, with the ‘closer’ samples having more weight. The new
estimator will be of the form:
![]()
where the
weightings sum to 1. If this condition is met, and there is no trend (locally) then T* is an unbiased estimator. This means that
over a lot of estimations the average error will be zero. This type of
estimator is called a ‘linear’ estimator because it is a linear combination of
the sample values. The arithmetic mean is simply a special case where all of
the weights are equal. It can be shown that the estimation variance for the
general ‘unbiased linear’ estimator is:

Where we previously calculated
(S,A) which was the average semi-variogram
between each sample and the unknown area, we now form a weighted average for
each individual sample with the area A,
(Si,A), in the same way as the actual estimator.
For example, if we have n samples taken around the area A, the first term in the variance becomes:

The last term in the variance,
(A,A), does not change its form, since we have only changed
the form of the estimator, not the area being estimated. The term
(S,S) which previously measured the variation
in values between the samples, must now take into account the different weights
associated with each sample. Thus, if we take, e.g., sample 4 we must remember
that it has a weight w4. If we take sample 4 with, say,
sample 2, then we must include both weights w2 and w4, so that the term becomes
w2 w4
(S2,S4)
To form the equivalent
(S,S), then, each
(Si,Sj) is multiplied by the corresponding
wiwj before being added into the sum.

Fig. 5.1. Three
samples to be used to estimate the line segment.
As a simple example, let us consider the setup in Fig. 5.1.
We calculated the extension variance for this setup in Chapter 4. The
arithmetic mean gave an extension variance of
pl/18 when we used a linear
semi-variogram with the form g(h) =ph. Now suppose we allocate a set of weights
to these three samples instead of treating them all the same. Let us, for this
example, put three-quarters of the ‘weight’ at the central point, and an eighth
on each end point. The ‘new’ estimator is therefore:

The
reliability of this estimator will be given by the general form of se². The term
(A,A) is given by the function F(l), by definition, which for the linear
semi-variogram takes the form pl/3. The central term --- the ‘within
samples’ term --- is:

where each sample is taken with each sample in turn ---
including itself --- and the combination is multiplied by both weights. Since
the samples in this case are all points, all the
(Si,Sj) terms can be found directly from
the semi-variogram g(h). This gives:

Since the semi-variogram model is linear, this becomes:

This leaves only the first term --- the between sample and
area term --- to be evaluated. This is:
![]()
Sample 1 is at one end of the length l, so that
(S1,A) =c(l). Similarly
(S3,A) =c(l). Sample 2 is the central point of the
length l, so that
(S2,A) =c( l/2). For the linear model c(l) =pl/2, so that

Putting all these together, gives an estimation variance
of:

The specified set of weights, (1/8, 3/4, 1/8) when used with a linear semi-variogram model
give an estimation variance of 0.0729pl. The arithmetic mean of the
samples gave an extension variance of pl/18=0.0556pl, which is three-quarters of the size of
the estimation variance above. It is fairly obvious in this case that the specified
weighted average gives a worse estimator than simply using the arithmetic mean.

Fig. 5.2. Estimation of the block value is required from the five scattered samples --- uranium example.
As a two-dimensional example, let us return to the ubiquitous
uranium example as illustrated in Fig. 5.2. We shall allocate weights to each
sample according to its distance from the centre of the block A. Table 5.1 shows the ‘inverse distance’ calculation
for the weights in this example.
Table 5.1. Calculation of Inverse Distance weightings for hypothetical Uranium estimation problem
|
Sample |
Distance from |
Inverse |
Corrected |
|
number |
centre (ft) |
distance |
weight |
|
1 |
21.54 |
0.0464 |
0.319 |
|
2 |
50.00 |
0.0200 |
0.137 |
|
3 |
31.62 |
0.0316 |
0.217 |
|
4 |
30.00 |
0.0333 |
0.229 |
|
5 |
70.00 |
0.0143 |
0.098 |
|
Total |
|
0.1457 |
1.000 |
The
estimator T* becomes:

The three terms which go into the variance are:

The individual values of
(S1,A) and so on are the same as those evaluated
when considering the extension variance, and are equal to 356.7, 572.4, 456.9,
446.8 and 696.1 respectively. Thus the first term in the variance is equal to
461.9 (p.p.m.)²--- as opposed to 505.8 (p.p.m.)² in the extension variance. The second term in the
variance of the weighted mean turns out to be 441.2 (p.p.m.)². In the extension case it was 504.7 (p.p.m.)². The final term in both variances is 344.0 (p.p.m.)² as given by the auxiliary function F(l,b). Putting these Figures together gives us
an estimation variance for the inverse distance estimator of 138.6 (p.p.m.)². This is equivalent to a ‘standard error’ of 11.8
p.p.m. Thus, using the inverse distance weights we obtain an estimate of 372.8
p.p.m. U3O8
for the panel, and we can say that this has a standard error of 11.8 p.p.m.,
which has been derived from our knowledge of the semi-variogram of this
deposit. If we are willing to make the additional assumption of Normality of
the errors, we could say that a 95% confidence interval for the ‘true’ value of
the panel is (349 p.p.m., 396 p.p.m.). This should be compared to the extension
estimate of 366 p.p.m., with a standard error of 12.8 p.p.m., and a 95%
confidence interval of (340 p.p.m., 392 p.p.m.). It can quickly be seen that
the inverse distance estimation produces a (slightly) more accurate result than
the arithmetic mean --- this seems quite sensible.
Suppose we change the situation slightly. Suppose
that sample 3 was not north of the block but south, i.e. northing 2310, as in
Fig. 5.3.

Fig. 5.3. Sample 3 is
now located South of the block to be estimated.
The inverse distance weights are
unchanged, because they depend on the distance between the sample and the block
centre.
However,
the estimation standard error rises sharply to 14.3 p.p.m., indicating a loss
in accuracy. The change in the estimation variance is caused solely by a
decrease in the size of the ‘within samples’ term --- the amount of information
contained in the sample set --- since the other two terms remain the same as
before. In actual Figures the term falls from 441.2(p.p.m.)² to 376.2(p.p.m.)². It does not really seem very
sensible to use the same weights for Fig. 5.3 as we did for Fig. 5.2, since
sample 3 now gives us a lot less ‘information’ than it did previously. We could
suggest a new set of weights and calculate the estimation variance. If this
were smaller than the ‘inverse distance’ set, we could say our ‘new’ estimator
was (in some sense) better than the inverse distance one. Alternatively, we
could use a different method to produce the weights --- inverse distance
squared, for example. It would seem a lot more desirable to find a direct
method of producing the ‘best’ estimate, given our knowledge of the deposit.
We have decided to use a ‘linear’ type of estimator --- a
weighted average of the sample values. We know that it is an unbiased estimator
if the weights add up to 1. There is an infinite number of such linear unbiased
estimators, so we will search for the ‘best’ one, and we will define ‘best’ as
‘having the smallest estimation variance’. The expression for the estimation
variance depends on three things: the basic geometry of samples and unknown
area, the form of the semi-variogram, and the weighting allocated to each
sample. For any given setup the variance can only be changed by altering the
values of the weights. Thus we
wish to minimise the estimation variance with respect to the weights. The
variance is a simple (?) function of the weights, so to minimise it we must
differentiate and set the differential equal to zero:
![]()
This will
provide n equations in n unknowns (w1,
w2, w3, w4...wn). These weights will provide an
estimator which has the minimum value of the estimation variance. However, they
will not necessarily add up to one. There is nothing in the above system of
equations that constrains the weights in this way. Effectively, we also need to
satisfy the equation Swi=1. Thus, to obtain the ‘Best Linear
Unbiased Estimator’ we actually have to satisfy (n+1) equations. However, we only have n unknowns, so far, which is not a very desirable
condition. To rectify this we must introduce another unknown, in the form of a
Lagrangian Multiplier, to balance up the system. Therefore, instead of
minimising the estimation variance, we actually minimise:
![]()
with respect to w1,
w2, w3, w4... wn, and l. This last produces the equation:
S wi – 1 = 0
as is required. After the differentiation has been
performed and the equations tidied up, the following system results:

Although this looks slightly fearsome in its complexity, if
you look a little closer, you will find that most of the elements are (I hope)
by now familiar to you. Consider the first equation. The right hand side merely
requires the average semi-variogram value between sample 1 and the unknown
area. The left hand side contains the n+1 unknowns, wi and l, and the average semi-variogram
value between sample 1 and each of the other samples in turn. All of these
terms are identical to those which we would
work out for an extension or an estimation variance.
The second equation is identical to the first except that
it is sample 2 which is present right along the equation. The third has sample
3 all the way along, and so on until the nth equation with Sn all the way along. Finally, we
have the necessary condition for the sum of the weights. The solution to this
set of equations will produce a set of weights giving the ‘Best Linear Unbiased
Estimator’ --- sometimes referred to as BLUE. This process was named Kriging by
Georges Matheron after Danie Krige, who has done a tremendous amount of
empirical work on weighted averages. Pronunciation of the word is a
controversial topic --- I personally prefer ‘kridging’ (as in bridging). The
system of equations is generally referred to as the kriging system, and the
estimator produced is the kriging estimator. The variance of the kriging
estimator could be found by substitution of the weights into the general
estimation variance equation. However, it can be shown that the kriging
variance can be written as:
![]()
Earlier
in this chapter we took the setup in Fig.5.1, a semi-variogram of the form g(h)=ph, and allocated a set of weights to each
of the three samples. We showed that in that case the estimation variance was
0.0729pl. Now let us see if we can find the ‘best’
set of weights using the kriging system. Since we have three weights, there are
four equations which in the general form are:

and the kriging variance would be:
![]()
The right hand side of the equations are
(S1,A) ,
(S2,A)
and
(S3,A).
(S1,A) is the average semi-variogram value
between a line of length l and a point on the end of it. This
is the definition of c(l). So is
(S3,A).
(S2,A) is the average semi-variogram between the
line and a central point. This example was tackled in Chapter 4 and found to be
c(l/2). Thus we have the right hand side of
the equations and most of the variance. The left hand side of the equations is
made up of the individual sample-with-sample terms. Since all of the samples in
this case are supposed to be points, all of these ‘left hand side’
relationships are given by the semi-variogram model. It remains only to
calculate the distances between the pairs and use the model to produce the
terms. The diagonal terms,
(S1,S1),
(S2,S2) and
(S3,S3) are all zero, since g(0) =0 by definition,
(S1,S2) is equal to
(S2,S1), and is g(h) ( l/2) . So is
(S2,S3) and
(S3,S2).
(S1,S3) is g (l) as is
(S3,S1). Finally,
(A,A) is F(l) by definition. Putting these
together gives:
|
|
w2 γ(l/2) |
+ w2 γ(l) |
+ l |
= |
c(l) |
|
w1
γ(l/2) |
|
+ w2 γ(l/2) |
+ l |
= |
c(l/2) |
|
w1
γ(l) |
+ w2 γ(l/2) |
|
+ l |
= |
c(l) |
|
w1
|
+ w2 |
+ w3 |
|
= |
1 |
and the
kriging variance is
![]()
Up until this point the system does not depend on the
actual model for the semi-variogram. Our model was g(h) =ph and for this example we will take p=4. For the linear semi-variogram, c(l) =pl/2, so in this case c(l) =2l. Similarly, F(l) =4l/3. Substituting in the above system:
|
|
2lw2 |
+ 4lw2 |
+ l |
= |
2l |
(1) |
|
2lw1 |
|
+ 2lw2 |
+ l |
= |
l |
(2) |
|
4lw1 |
+ 2lw2 |
|
+ l |
= |
2l |
(3) |
|
w1
|
+ w2 |
+ w3 |
|
= |
1 |
(4) |
and
![]()
Adding equations (1)
and (3) gives
![]()
whilst equation (4)
gives
![]()
These two together show that in this case l=0. Thus we can eliminate l from the first three equations,
and divide all of them through by l. This suggests that the results
--- the final values of the weights --- do not rely on the length being
estimated. We have then:
|
|
2w2 |
+ 4w2 |
= |
2 |
(5) |
|
2w1 |
|
+ 2w2 |
= |
1 |
(6) |
|
4w1 |
+ 2w2 |
|
= |
2 |
(7) |
Subtracting
equation (5) from equation (7) gives:
![]()
so that equation (6) gives:
![]()
Therefore, w3 = 1/4 and w2 = 1/2. The ‘optimal’ set of weights for
the problem in Fig. 5.1 is therefore (1/4, 1/2, 1/4) and this estimator gives a kriging variance
of:
![]()
The final result is therefore that:
the BLUE has weights of (1/4, 1/2, 1/4) and a kriging
variance of l/6.
In our previous studies of this particular setup, we found that
the arithmetic mean gave an extension variance of pl/18 and that the set of weights (1/8, 3/4,
1/8) gave 7pl/96. To match the example above we should
set p=4, so that the variances become 2l/9 and 7l/24 respectively. The kriging
procedure has improved over the arithmetic mean by about 25%, this being the
difference in magnitude between the extension variance and the kriging
variance. The spurious set of weights which we tried earlier in this chapter
give a variance almost twice that of the optimal estimator. Note that in this
case, since we have used a linear
semi-variogram model, the set of weights is independent of the length being
estimated, but the variance is directly proportional to it. For an exercise,
see if you can show that the set of weights is also independent of the slope of the semi-variogram, p. Show also that the kriging variance is directly
proportional to p, which seems only sensible.
Now let
us return to the familiar uranium example shown in Fig. 5.2. The data ---
position co-ordinates and grade values --- were given in Table 4.1. We have
five samples, so there will be six equations in the kriging system. The
(Si,Sj) terms on the left hand side of the
equations are all ‘point-with-point’ relationships, and can be evaluated
directly from the semi-variogram model. The model was spherical, with a range
of influence of 100ft, a sill of 700 (p.p.m.)² and a nugget effect of 100
(p.p.m.)². The
(Si,A) terms are all ‘point-with-panel’
relationships and hence are simple combinations of the H(l,b) auxiliary function. The
(A,A) term is, of course, F(l,b).
The kriging system turns out to be:

and the kriging variance would be:
![]()
Solving this system of equations (by computer) gives:

This kriging standard deviation compares favourably with
the standard error of the ‘inverse distance’ weights. The main difference in
the weighting is perhaps a surprising one at first sight. Sample 2 is allocated
an almost zero weight. In fact, it receives only 20% of the weight on sample 5,
which is considerably farther from the block centre. This is because the
kriging system automatically takes account of the relationship amongst the
samples. Sample 2 provides little extra information about the block in the
presence of sample 1, and to a lesser extent samples 3 and 4. To illustrate
this point further, let us consider the situation in Fig. 5.3, where sample 3
was moved to the south of the block. The kriging system now becomes:

Note that the only difference between this
and the previous system is in row 3 and column 3 on the left hand side. The
right hand side is unchanged since we have not changed the relationship of each
individual sample to the panel. The new weights are:

The weight on the third sample is now less
than that of sample 2. The main movement in weight has been towards the
‘northern’ samples, 1, 2 and 5, although the largest increase is, naturally, in
sample 1. The really striking change is in the value of the estimator which has dropped by 17 p.p.m.
As a
third example let us consider the same sampling situation as before, but now
with the block rotated through 90 degrees, as in Fig. 5.4.

Fig. 5.4. Block to be estimated is
rotated through 90º.
The kriging
system for this setup has the same left hand side as the first situation in
Fig. 5.2. However, all of the terms on the right hand side have changed, to:
![]()
The weights allocated to the samples change drastically:

The
estimator T* takes the value 371.9 p.p.m., and has a
standard error of 10.7 p.p.m. Sample 2 has been completely screened out by
sample 1, and the influence of sample 5 has also declined somewhat. The biggest
surprise, perhaps, is that sample 1 no longer has anything like the importance
it had in the previous two examples. There is no method other than kriging in
which this shift in ‘information value’ can be quantified and utilised.
1. We can evaluate the accuracy of any linear estimator if we have a model for
the semi-variogram.
2. We can produce the minimum variance
unbiased linear estimator using the kriging technique, if we have a model for
the semi-variogram.
The standard litany of the advantages of kriging can be
found in numerous publications. The points of major importance are:
a) Given the basic
assumptions, no trend, and a model for the semi-variogram, kriging always produces the Best Linear Unbiased
Estimator.
b) If the proper models are
used for the semi-variogram, and the system is set up correctly, there is always a unique solution to the kriging
system.
c) If you try to ‘estimate’
the value at a location which has been sampled, the kriging system will return
the sample value as the estimator, and a kriging variance of zero. In other
words, you already know that value. This is
usually referred to as an ‘exact interpolator’.
d) If you have regular sampling, and
hence the same sampling/block setup at many different positions within the
deposit, it is not necessary to recalculate the kriging system each time.
To round out this chapter, let us return
to the simulated iron ore deposit mentioned at the end of Chapter 4. Two sets
of samples had been taken. The 50 ‘random’ samples were shown in Fig. 4.14 and the
regular grid in Fig. 4.17.
The regular grid actually comprises 41
samples. As before, the 400-metre-square area was divided into 50-metre-square
blocks. However, this time the block values were estimated by the method of
kriging. The range of influence of the semi-variogram model for this deposit
was 100m, so all samples within this distance of a block were included in its
estimation. The results are shown in Fig. 5.5, and once again the upper Figure
in each block is the estimated value, whilst the lower is the kriging standard
deviation, or standard error. For comparison, Fig. 5.6 shows the kriging
solution for the case where the area is divided into 100-metre-square blocks.
Notice that the kriging standard deviations in all cases are much lower than for
the 50-metre blocks.
|
Fig. 5.5. Simulated
iron ore deposit --- block averages kriged from the set of random samples and
the corresponding kriging standard deviations --- 50 metre blocks. |
Fig. 5.6. Simulated
iron ore deposit --- block averages kriged from the set of random samples and
the corresponding kriging standard deviations --- 100 metre blocks. |
This once again bears out the principle
that it is ‘easier’ to estimate large areas rather than small ones. Figure 5.7
shows the estimated values for the blocks when using the sample values from the
regular grid. The kriging standard deviation for all of these blocks is 2.4%Fe.
This is not markedly different from the extension standard deviation of 2.5%Fe.
Perhaps our conclusion here must be that with a regular grid of this particular
size, the arithmetic mean of the two ‘corner’ samples is as good an estimator
as a weighted average of all samples within 100m of the block. The exterior
samples would seem to be superfluous in the circumstances. This conclusion does
not hold for the irregular sampling, for which some great improvements in
accuracy result from the application of kriging.

Fig.
5.7. Simulated iron ore deposit --- block averages kriged from the set of
regular samples --- 50 metre blocks.
The usual requirement in ore reserve
estimation is the production of block values. However, in many other possible
applications of kriging --- such as geochemistry or hydrology --- the estimation
desired is in the form of ‘point’ values, or a contour map of the variable of
interest. Kriging can be used to produce the close grid of values necessary to
the plotting of contour maps. In fact, the procedure is very much easier than
‘area’ estimation, since all the ‘average semi-variogram values’ reduce to
simple values of the semi-variogram model itself. Since all of the observations
are made at specified points, the left hand side of the kriging system is
‘point-with-point’ semi-variogram values. Since the value to be estimated is
also at a point, the right hand side is also ‘point-with-point’ semi-variogram
values. Figure 5.8 shows the contour map produced using the 50 randomly chosen
samples. The blacked-out area at the top of the map is outside the range of
influence of any sample, and hence cannot be estimated.
|
Fig.
5.8. Simulated iron ore deposit --- contour map kriged from random samples. |
Fig.
5.9. Simulated iron ore deposit --- kriged standard deviation map for the kriged
contour map from random samples. |
One of the advantages of kriging as an interpolation
technique is that every estimate is accompanied by a corresponding kriging
standard deviation. Thus, for any contour map of values, a companion map of ‘reliability’
can be produced. This is shown in Fig. 5.9. The location of the sample points
can easily be seen by the concentration of the low value contours in Fig. 5.9
--- the 1%Fe and 2%Fe contours. The highest possible contour value would be 5Ï2=7.07%Fe, which is the boundary around the blacked
out area. This corresponds to trying to estimate the value at a point which is
just on the range of influence away from the nearest sample.
The ‘unreliable’ areas are quite clearly
outlined by the 5%Fe contour (the sample standard deviation). A standard error
which is larger than the original sample standard deviation denotes a rather
unreliable prediction.
|
Fig.
5.10. Simulated iron ore deposit --- contour map kriged from regular samples. |
Fig.
5.11. Simulated iron ore deposit --- map of kriging standard deviations for
map kriged from regular samples. |
Figure 5.10 shows the interpolated contour map using the
regular samples, and Fig. 5.11 the corresponding map of the kriged standard
deviation. Note that, even though only 41 samples are available, and even
though these are on a very large grid (71m), the highest contour value in Fig.
5.11 is only 4%Fe.
An
additional advantage of kriging as an estimation technique is that the maps
and/or calculations of the ‘standard errors’ can be produced without actually taking the samples. For
example, if infill drilling were proposed on the regular grid, it is fairly
obvious from Fig. 5.11 where the new samples should be taken. If a decision was
taken to reduce the grid to 50m, i.e. put a hole in the centre of each 4%Fe
contour, a complete new map of the resulting standard error could be drawn
before setting foot in the field.