Contents Page

Previous section

Isobel’s Home Page

Practical Geostatistics 2000

Courses

 

CHAPTER 1: Introduction

Perhaps it would be useful here at the very beginning to clear up any possible ambiguity which arises because of the use of the title Geostatistics. In the early 1960s, after much empirical work by authors in South Africa, Georges Matheron, now Head of the Centre de Morphologie Mathématique in Fontainebleau, France, published his treatise on the Theory of Regionalised Variables. The application of this theory to problems in geology and mining has led to the more popular name Geostatistics. The contents of this book are confined to the simplest application of the Theory of Regionalised Variables, that of producing the ‘best’ estimation of the unknown value at some location within an ore deposit. This technique is known as kriging. The purpose of this text is to provide a simple treatment of Geostatistics for the reader unfamiliar with the field. The subject may be discussed at a number of levels of mathematical complexity, and it is the intention here to keep the mathematics to a necessary minimum. Some previous knowledge must be assumed on the reader’s part of basic concepts of ordinary statistics such as mean, variance and standard deviation, confidence intervals and probability distributions. Readers without this background are referred to any one of a large number of excellent basic texts.


The application of Geostatistics to the estimation of ore reserves in mining is probably its most well known use. However, it has been emphasised time and again that the estimation techniques can be used wherever a continuous measure is made on a sample at a particular location is space (or time), i.e., where a sample value is expected to be affected by its position and its relationships with its neighbours. Since most applications -- and most of the author’s experience -- are confined to the mining field, so will most of the examples in this book. Also, there will be a tendency to talk of ‘grades’ rather than ‘sample values’, for brevity if nothing else. If the reader is interested in other fields, it suffices to replace ‘grade’ by porosity, permeability, thickness, elevation, population density, rainfall, temperature, fracture length, abundance or whatever.


The application of statistical methods to ore reserve problems was first attempted some 30 years ago in
South Africa. The problem was that of predicting the grades within an area to be mined from a limited number of peripheral samples in development drives in the gold mines. Gold values are notoriously erratic, and when plotted in the form of a histogram show a highly skewed distribution with a very long tail into the rich grades. Normal (Gaussian) statistical theory will not handle such distributions unless a transformation is applied first. H. S. Sichel applied a log-normal distribution to the gold grades and achieved encouraging results. He then published formulae and tables to enable accurate calculation of local averages for log-normal variables, and also confidence limits on those local averages. Three major drawbacks exist in the application of Sichel’s ‘t’ estimator. The ‘background’ probability distribution must be log-normal. The samples must be independent. There is no consideration taken of the position of the samples -- all are equally important. However, the technique proved very useful in the gold mines, especially since some measure of the reliability of the estimator was provided. It also laid the base for further statistical work by providing the conceptual framework necessary, i.e., by assuming that the sample values came from some probability distribution. At this stage, it was assumed that all the samples (in a given area) came from the same probability distribution -- a log-normal one -- and this assumption is known in ordinary statistics as ‘stationarity’.


Subsequent to this work attempts were made to incorporate position and spatial relationships into the estimation procedure. Two things seemed sensible: there should be ‘rich’ areas and ‘poor’ areas within a deposit; and there should be some sort of relationship between one area and the next. These were tackled in the 1950s and early 1960s by the introduction of Trend Surface Analysis. In
South Africa, trends were picked out by forming a ‘rolling mean’ which produced a smoothed map so that high and low areas could be distinguished. In the United States a ‘Polynomial Trend Surface’ analysis was propounded which used a statistical technique to fit a mathematical equation to describe the trend. Both methods have one thing in common -- the basic assumptions about the statistical characteristics of the deposit. These assumptions have been extended from the ‘stationarity’ one, by stating that the sample value is expected to vary from area to area in the deposit. Some areas are expected to be rich, some to be poor. This expectation can be expressed as a reasonably smooth variation, either by a smoothed map or a relatively simple equation. Round about this trend there is expected to be random variation. That is, the value at any point in the deposit is supposed to comprise (i) a ‘fixed’ component of the trend (which is probably unknown), and (ii) a random variable following one specific distribution. Thus the stationarity has been shifted one step; the expected grade may vary slowly, but the random component is ‘stationary’. We have also dropped the log-normality assumption. This approach is quite useful for an overview of the deposit, but, except in heavily sampled areas like the gold mines, is not really useful for local estimation.

Fig 1.1 Hypothetical sampling and estimation situation

Let us consider the problem of local estimation, e.g. of trying to estimate the value at, say, point A in Fig 1.1, given the samples at the various locations shown. It seems reasonable to evolve an estimation procedure which gives more importance to sample 1 than to sample 5. A whole range of methods have been produced to decide on the ‘weight’ accorded to each sample, mostly based on the distance of the sample from the point being estimated. Sample values may be weighted by inverse-distance, inverse-distance squared, or by some arbitrary constant (e.g. range of influence) minus the distance. All of these involve the same basic assumption -- that the relationship between the value at point A and any sample value depends on the distance (and possibly direction) between the two positions, and on nothing else. It does not depend on whether one is in a rich or poor zone, or on the actual sample values, but only on the geometric placing of the samples. In fact, it does not even depend on the mineral in the deposit!


There are some problems with this approach. Which weighting factors are the best to choose? How far do you go in including samples -- if there is a sample 6 which is twice as far away as sample 5, should it be included? How reliable is the estimate when we get it? Can we seriously expect the same estimation method to be equally valid on all types of deposits? On the other hand, the idea of weighting samples by some measure of their similarity to what is being estimated is intuitively appealing. It also seems to avoid those crippling restrictions on what distribution of values you can handle, which so limit the other methods of estimation. ‘Similarity’ can be measured statistically by the covariance between samples or by their correlation. However, to calculate either of these we have to go back to ‘stationarity’ type assumptions. Let us look instead at the difference between the samples.


In Fig 1.1 is seems sound to expect that the value at position 5 will be ‘very different’ from that at A, whilst sample 1, say, will have a value ‘not very different’ from that at A. Let us make an assumption that the difference in value between two positions in the deposit depends only on the distance between them and their relative orientation. Suppose we took a pair of samples 50ft apart on a north-south line in one part of the deposit, and measured the difference between the two values. Now, suppose we did the same, say, 200ft away. And again in yet another position, and so on. The value obtained (difference in grade) would be different for each pair of samples, but under our assumptions all of these values would be from the same probability distribution. Thus, if we could take enough such pairs, we could build up a histogram of the differences and investigate the distribution from which they were drawn. We would expect that distribution to be governed by the distance between the pair and the relative orientation, i.e. 50ft, north-south. Effectively, we have worked the implicit assumptions of the distance weighting techniques into a statistical form.


However, we will have one histogram for every different distance and direction in the deposit. To build up any useful picture of the deposit we need as many different distances and directions as possible. To investigate a histogram for each would be tedious and would overwhelm us with not terribly useful information. Let us resort to the usual trick of summarising the histogram in a couple of simple parameters. The usual ones are the arithmetic mean (average) and the variance, or equivalently the standard deviation. Suppose, for shorthand, we describe the distance between the samples and the relative orientation as
h. We have said that the difference in grade between the two samples depends only on h. In statistical terms, the distribution of the differences depends only on h. If this is true of the whole distribution, it is also true of its mean and variance. That is, we can describe the mean difference in grade as m(h) and the variance of these differences as 2g(h). If we had a set of pairs of samples for a specific h (say 50ft, north-south) then we could calculate an ‘experimental’ value for m(h).

 


where
g stands for grade, x denotes the position of one sample in the pair and x+h the position of the other, and n is the number of pairs which we have. You will have noticed the introduction of the ‘* to show that this is something we have calculated rather than something ‘theoretical’. Unfortunately, it can be shown that this is not a very good way of estimating m(h), and that to get a good way involves intense mathematical complications. Let us look closer at m(h) itself. It represents an average difference in grades between two samples -- in other words, an ‘expected’ difference. If m(h) is zero, this implies that we ‘expect’ no difference between grades a distance h apart. Put another way, we ‘expect’ the same sort of grades over an area of the deposit which is at least as large as h. In jargon terms, locally (within h) there is no trend. It is a convenient assumption to make for our purposes, so we will assume that there is no trend within the scale in which we are interested. We will see later what happens if this is not true.


Having rid ourselves of
m(h), let us turn to the variance of the differences. This has been called 2g(h) and is usually known as the variogram, since it varies with the distance (and direction) h. In practice, having made our no-trend assumption, we can calculate:

 


The ‘2’ in front of the g  is there for mathematical convenience. The term
g(h) is called the semi-variogram (although some authors sloppily call it the variogram), and g*(h) is the experimental semi-variogram; g*  bears the same relationship to g  that a histogram does to a probability distribution.


Having defined a semi-variogram, what sort of behaviour do we expect it to have. We have a measure of the difference between the grades a distance (and direction)
h apart. The measure which we have is in units of grade squared, e.g. (% by weight)², (p.p.m.)² and so on, and we calculate a value for the experimental semi-variogram for as many different values of h as possible. The easiest way to display these figures is in a graph -- hence the name semi-variogram. It is usual to plot the graph as in Fig 1.2. That is, the distance between the pairs of samples is plotted along the horizontal axis and the value of the semi-variogram along the vertical. By definition h starts at zero, since it is impossible to take two samples closer than no distance apart. The g  axis also starts at zero, since it is an average of squared values.

 

Fig 1.2: Usual method of plotting a semi-variogram on a graph

 

Consider the case when h is equal to zero. We take two samples at exactly the same position and measure their values. The difference between the two must be zero, so that g  and g* must always pass through the origin of the graph. Now suppose we let the two samples move a little distance apart. We would now expect some difference between the two values, so that the semi-variogram will have some small positive value. As the samples move further apart the differences should rise. In the ideal case when the distance becomes very large the sample values will become independent of one another. The semi-variogram value will then become more or less constant, since it will be calculating the difference between sets of independent samples. This so-called ‘ideal shape’ for the semi-variogram is shown in Fig 1.3, and is to Geostatistics as the Normal distribution is to statistics.

 

Fig 1.3: The ‘ideal’ shape for a semi-variogram -- the spherical model.

 

It is a ‘model’ semi-variogram and is usually called the spherical or Matheron model. The distance at which samples become independent of one another is denoted by a and is called the range of influence of a sample. The value of g  at which the graph levels off is denoted by C and is called the sill of the semi-variogram. The spherical model is given mathematically as:

 


This model was originally derived on theoretical grounds (as was the Normal distribution) but has been found to be widely applicable in practice.

 

There are many other possible models of semi-variograms, but only a few are commonly used. One other model with a sill which seems to have found some application is the exponential model. This is described by:

 


This model rises more slowly from the origin than the spherical and never quite reaches its sill. Figure 1.4 shows the spherical and exponential with the same range and sill. Figure. 1.5 shows the two with the same sill and the same initial slope for comparison. The reason for this will become clear in the next chapter.

 

Fig 1.4: Comparison of the exponential and spherical models with the same range and sill

 

Fig. 1.5: Comparison of the exponential and spherical models with the same initial slope and sill.


One of the interesting properties of models with a sill -- both mathematically and for the applications -- is that the sill value,
C, is equal to the ordinary sample variance of the grades. If you could take a set of random independent observations from the deposit and calculate the sample variance:

 


then
and  C  will both be estimates of the same ‘true’ sample variance. The relationship between s² and C will be seen later to have wide-ranging consequences.


There are also models which have no sill. The simplest of these is the linear model:

 

 

where p is the slope of the line. An extension of this model is the ‘generalised linear’:

 


where
a  lies between 0 and 2 (but must not equal 2). This model is shown in Fig 1.6 for various values of a.

 

Fig 1.6. The linear and the generalised linear model

 

Another model without a sill is the de Wijsian model:

 


in which the semi-variogram is linear if plotted against the logarithm of the distance.

 

One other model exists, to describe the semi-variogram of a purely random phenomenon. Effectively, it is a spherical model with a very small range of influence. The ‘nugget effect’ as it is called, is given by:

 


Note that even with completely random phenomena the semi-variogram must be zero at distance zero. Two samples measured at exactly the same position must have the same value.


In practice, many semi-variograms comprise a mixture of two or more of these models and we shall see some of these in Chapter 2. To summarise our introduction to Geostatistics, here are the basic assumptions necessary for their application:

 

1.    Differences between the values of samples are determined only by the relative spatial orientation of those samples.

 

2.    We are really interested only in the mean and variance of the differences, so our real contention is that these two parameters depend only on the relative orientation. This is known as the ‘Intrinsic Hypothesis’.

 

3.    For convenience we have assumed that there is no trend on the deposit which is likely to affect values within the scale of interest. Thus we are only interested in the variance of the difference in value between the samples.

From these assumptions we have produced the notion of a semi-variogram, and we have discussed the sort of shape which we expect a semi-variogram to take. In the next chapter we will look at the process of actually calculating an experimental semi-variogram and trying to relate it to the models discussed.

Contents Page

Next section

Isobel’s Home Page

Practical Geostatistics 2000

Courses