CHAPTER 1: Introduction
Perhaps
it would be useful here at the very beginning to clear up any possible
ambiguity which arises because of the use of the title Geostatistics. In the
early 1960s, after much empirical work by authors in
The application of Geostatistics to the estimation of ore
reserves in mining is probably its most well known use. However, it has been
emphasised time and again that the estimation techniques can be used wherever a
continuous measure is made on a sample at a particular location is space (or
time), i.e., where a sample value is expected to be affected by its position
and its relationships with its neighbours. Since most applications -- and most of
the author’s experience -- are confined to the mining field, so will most of
the examples in this book. Also, there will be a tendency to talk of ‘grades’
rather than ‘sample values’, for brevity if nothing else. If the reader is
interested in other fields, it suffices to replace ‘grade’ by porosity,
permeability, thickness, elevation, population density, rainfall, temperature,
fracture length, abundance or whatever.
The application of statistical methods to ore reserve
problems was first attempted some 30 years ago in
Subsequent to this work attempts were made to incorporate
position and spatial relationships into the estimation procedure. Two things
seemed sensible: there should be ‘rich’ areas and ‘poor’ areas within a
deposit; and there should be some sort of relationship between one area and the
next. These were tackled in the 1950s and early 1960s by the introduction of
Trend Surface Analysis. In

Fig 1.1 Hypothetical
sampling and estimation situation
Let us
consider the problem of local estimation, e.g. of trying to estimate the value
at, say, point A in Fig 1.1, given the samples at the various locations
shown. It seems reasonable to evolve an estimation procedure which gives more
importance to sample 1 than to sample 5. A whole range of methods have been
produced to decide on the ‘weight’ accorded to each sample, mostly based on the
distance of the sample from the point being estimated. Sample values may be
weighted by inverse-distance, inverse-distance squared, or by some arbitrary
constant (e.g. range of influence) minus the distance. All of these involve the
same basic assumption -- that the relationship between the value at point A and any sample value depends on the
distance (and possibly direction) between the two positions, and on nothing
else. It does not depend on whether one is in a rich or poor zone, or on the
actual sample values, but only on the geometric placing of the samples. In
fact, it does not even depend on the mineral in the deposit!
There are some problems with this approach. Which weighting
factors are the best to choose? How far do you go in including samples -- if
there is a sample 6 which is twice as far away as sample 5, should it be
included? How reliable is the estimate when we get it? Can we seriously expect
the same estimation method to be equally valid on all types of deposits? On the
other hand, the idea of weighting samples by some measure of their similarity
to what is being estimated is intuitively appealing. It also seems to avoid
those crippling restrictions on what distribution of values you can handle,
which so limit the other methods of estimation. ‘Similarity’ can be measured
statistically by the covariance between samples or by their correlation.
However, to calculate either of these we have to go back to ‘stationarity’ type assumptions. Let us look instead at the difference between the samples.
In Fig 1.1 is seems sound to expect that the value at
position 5 will be ‘very different’ from that at A,
whilst sample 1, say, will have a value ‘not very
different’ from that at A. Let us make an assumption that the difference
in value between two positions in the deposit
depends only on the distance between them and their relative orientation.
Suppose we took a pair of samples 50ft apart on a north-south line in one part
of the deposit, and measured the difference between the two values. Now,
suppose we did the same, say, 200ft away. And again in yet another position,
and so on. The value obtained (difference in grade) would be different for each
pair of samples, but under our assumptions all of these values would be from
the same probability distribution. Thus, if we could take enough such pairs, we could build up a
histogram of the differences and investigate the distribution from which they
were drawn. We would expect that distribution to be governed by the distance
between the pair and the relative orientation, i.e. 50ft, north-south.
Effectively, we have worked the implicit assumptions of the distance weighting
techniques into a statistical form.
However, we will have one histogram for every different
distance and direction in the deposit. To build up any useful picture of the
deposit we need as many different distances and directions as possible. To investigate
a histogram for each would be tedious and would overwhelm us with not terribly
useful information. Let us resort to the usual trick of summarising the
histogram in a couple of simple parameters. The usual ones are the arithmetic
mean (average) and the variance, or equivalently the standard deviation.
Suppose, for shorthand, we describe the distance between the samples and the
relative orientation as h. We have said that the difference
in grade between the two samples depends only on h. In statistical terms, the distribution of the
differences depends only on h. If this is true of the whole
distribution, it is also true of its mean and variance. That is, we can
describe the mean difference in grade as m(h) and the variance of these
differences as 2g(h). If we had a set of pairs of
samples for a specific h (say 50ft, north-south) then we could
calculate an ‘experimental’ value for m(h).
![]()
where g stands for grade, x denotes the position of one sample in the pair and x+h the position of the other, and n is the number of pairs which we have. You will have
noticed the introduction of the ‘*’ to
show that this is something we have calculated rather than something ‘theoretical’. Unfortunately, it can be
shown that this is not a very good way of estimating m(h), and that to get a good way involves
intense mathematical complications. Let us look closer at m(h) itself. It represents an average
difference in grades between two samples -- in other words, an ‘expected’
difference. If m(h) is zero, this implies that we
‘expect’ no difference between grades a distance h apart. Put another way, we ‘expect’ the same sort of
grades over an area of the deposit which is at least as large as h. In jargon terms, locally (within h) there is no trend. It is a convenient assumption to
make for our purposes, so we will assume that there is no trend within the
scale in which we are interested. We will see later what happens if this is not
true.
Having rid ourselves of m(h), let us turn to the variance of the
differences. This has been called 2g(h) and is usually known as the variogram,
since it varies with the distance (and direction) h. In practice, having made our no-trend assumption, we
can calculate:
![]()
The ‘2’ in front of the g is
there for mathematical convenience. The term g(h) is called the semi-variogram (although
some authors sloppily call it the variogram), and g*(h) is the experimental
semi-variogram; g* bears the same relationship to g that a
histogram does to a probability distribution.
Having defined a semi-variogram, what sort of behaviour do
we expect it to have. We have a measure of the difference between the grades a
distance (and direction) h apart. The measure which we have
is in units of grade squared, e.g. (% by weight)², (p.p.m.)² and so on, and we calculate a value for the
experimental semi-variogram for as many different values of h as possible. The easiest way to display these figures
is in a graph -- hence the name semi-variogram. It is usual to plot the graph as
in Fig 1.2. That is, the distance between the pairs of samples is plotted along
the horizontal axis and the value of the semi-variogram along the vertical. By
definition h starts at zero, since it is impossible to
take two samples closer than no distance apart. The g axis also
starts at zero, since it is an average of squared values.

Fig 1.2: Usual method of plotting a
semi-variogram on a graph
Consider
the case when h is equal to zero. We take two samples at
exactly the same position and measure their values. The difference between the
two must be zero, so that g and
g* must always pass through the
origin of the graph. Now suppose we let the two samples move a little distance apart.
We would now expect some difference between the two values, so that the
semi-variogram will have some small positive value. As the samples move further
apart the differences should rise. In the ideal case when the distance becomes
very large the sample values will become independent of one another. The
semi-variogram value will then become more or less constant, since it will be
calculating the difference between sets of independent samples. This so-called
‘ideal shape’ for the semi-variogram is shown in Fig 1.3, and is to
Geostatistics as the Normal distribution is to statistics.

Fig
1.3: The ‘ideal’ shape for a semi-variogram -- the spherical model.
It is a
‘model’ semi-variogram and is usually called the spherical or Matheron model.
The distance at which samples become independent of one another is denoted by a and is called the range of influence of a sample. The
value of g at
which the graph levels off is denoted by C and is called the sill of the
semi-variogram. The spherical model is given mathematically as:

This model was originally derived on theoretical grounds
(as was the Normal distribution) but has been found to be widely applicable in
practice.
There are
many other possible models of semi-variograms, but only a few are commonly
used. One other model with a sill which seems to have found some application is
the exponential model. This is described by:
![]()
This model rises more slowly from the origin than the spherical
and never quite reaches its sill. Figure 1.4 shows the spherical and
exponential with the same range and sill. Figure. 1.5 shows the two with the
same sill and the same initial slope for comparison. The reason for this will
become clear in the next chapter.
|
|
|
Fig 1.4: Comparison of
the exponential and spherical models with the same range and sill |
|
|
|
Fig. 1.5: Comparison
of the exponential and spherical models with the same initial slope and sill. |
One of the interesting properties of models with a sill --
both mathematically and for the applications -- is that the sill value, C, is equal to the ordinary sample variance of the
grades. If you could take a set of random independent observations from the
deposit and calculate the sample variance:
![]()
then s² and C
will both be estimates of the same ‘true’ sample variance. The
relationship between s² and C will be seen later to have
wide-ranging consequences.
There are also models which have no sill. The simplest of
these is the linear model:
![]()
where p is the slope of the line. An extension of this model
is the ‘generalised linear’:
![]()
where a lies between 0 and 2 (but must not equal 2).
This model is shown in Fig 1.6 for various values of a.

Fig 1.6. The linear and the
generalised linear model
Another
model without a sill is the de Wijsian model:
![]()
in which the semi-variogram is linear if plotted against the
logarithm of the distance.
One other
model exists, to describe the semi-variogram of a purely random phenomenon.
Effectively, it is a spherical model with a very small range of influence. The
‘nugget effect’ as it is called, is given by:

Note that even with completely random phenomena the
semi-variogram must be zero at distance zero. Two samples measured at exactly the same position must have the same
value.
In practice, many semi-variograms comprise a mixture of two
or more of these models and we shall see some of these in Chapter 2. To
summarise our introduction to Geostatistics, here are the basic assumptions
necessary for their application:
1. Differences between the values of
samples are determined only by the relative spatial orientation of those
samples.
2. We are really interested only in
the mean and variance of the differences, so our real contention is that these
two parameters depend only on the relative orientation. This is known as the
‘Intrinsic Hypothesis’.
3. For convenience we have assumed
that there is no trend on the deposit which is likely to affect values within
the scale of interest. Thus we are only interested in the variance of the
difference in value between the samples.
From
these assumptions we have produced the notion of a semi-variogram, and we have
discussed the sort of shape which we expect a semi-variogram to take. In the
next chapter we will look at the process of actually calculating an
experimental semi-variogram and trying to relate it to the models discussed.