Contents Page

Previous section

Isobel’s Home Page

Practical Geostatistics 2000

Courses

 

CHAPTER 4: Estimation

So far we have used the basic concepts and assumptions of Geostatistics to build ourselves a ‘model’ of the structure and continuity within the deposit. We have also (in Chapter 3) seen how this can lead to the production of ‘theoretical’ grade/tonnage curves and the study of how mining block size can influence final production Figures. It is now time we returned to our original problem of the estimation of ore reserves. The discussion in this (and the next) chapter will be confined to ‘local’ estimation, i.e. interest is confined to one portion of the deposit at a time. However, it should be borne in mind that the same techniques can be applied on a global scale, i.e. to the whole deposit at once. It should also be remembered that block-by-block or stope-by-stope estimates will lead inevitably to global estimates.

Let us, then, define the situation which is of interest to us. There is a point or an area or a volume of ground over which we do not know the grade (or value), but we wish to estimate it. Let us call this ‘unknown’ grade T, and the area (or point, or volume) of interest A. In order to produce an estimator we must have some information, usually in the form of samples. To be completely general, let us suppose n samples with values of g1,g2,g3...gn. This set of samples is generally denoted by S. From these samples we can form a ‘linear’ type of estimator --- that is, a weighted average. We must restrict ourselves to this type of estimator at this stage. The estimator is denoted by T* and is equal to:

where the w1, w2, w3...wn are the weights assigned to each sample. Most currently used local estimation techniques use a weighted average approach --- inverse distance techniques and so on. The simplest case of all is when all of the weights are the same, and T* is just the arithmetic mean of the sample values.

 

Easting

Northing

Grade

Point

(ft)

(ft)

U3O8

A

4150

2340

 

1

4170

2332

400

2

4200

2340

380

3

4160

2370

450

4

4150

2310

280

5

4080

2340

320

Fig 4.1. Hypothetical sampling and estimation situation --- a uranium deposit.

Table 4.1 Positions and values on hypothetical Uranium estimation problem

 

Now consider the setup of samples and ‘unknown’ which we originally discussed in the first chapter. Figure 4.1 shows the point of interest which lies at position A, and we have five ‘point’ samples lying around this position. The co-ordinates of these six points and the values of the samples are given in Table 4.1. The hypothetical deposit is a low-grade, large-tonnage uranium one, which is assumed to be isotropic. The semi-variogram model fitted to this deposit is a spherical one with a range of influence of 100 ft, a sill value (C) of 700 (p.p.m.)² and a nugget effect of 100 (p.p.m.)². Let us take the simplest possible estimation procedure. Take the value at the closest sample position (1) and ‘extend’ this to the unknown point. In doing so we incur an estimation error, e, which will be equal to the difference between the actual value T and the estimated value T*, which in this case equals g1. That is:

 


It is not too difficult to show that if there is no trend (at least locally), this estimator is unbiased. That is, if we make lots of similar estimations the average error will be zero.


The ‘reliability’ of the estimation can be measured by looking at the spread of the errors. If the errors take values consistently close to zero, then the estimator is a ‘good’ one. If the spread of values is large, then the estimator will be unreliable. The simplest stable measure of spread (statistically) is the standard deviation. The standard deviation of an estimation error --- or standard error as it is referred to in ordinary statistics --- will therefore measure the reliability of that estimator.

 

No matter how many estimations we perform, we cannot calculate the standard deviation of the errors since we do not know the value of the error made. Therefore we must look at the ‘theoretical’ form of the variance of the estimation error, i.e. the estimation variance:


The average would be made (theoretically) over the whole deposit. That is, the same estimation situation would be repeated over the whole deposit and the variance found. This cannot be done in practice, of course, so let us look closer at the form of this variance. It is found by taking the grade at point
A, subtracting the grade at point 1, squaring the result, repeating the process over all possible pairs of such points and then averaging the values. This sounds exactly like the definition of a variogram. In fact, it is the variogram between the two points A and (1). Given the distance between them (h)  we can evaluate this estimation variance simply by reading a value from the semi-variogram model (g )  and multiplying it by 2. This is one of the reasons why it is good policy to avoid confusing the variogram and the semi-variogram. Thus:


In the case of our particular example given in Fig. 4.1:


Given our knowledge about this deposit, i.e. the semi-variogram model, we can state (without too much fear of error) that the estimator used has a standard error of 25.4 p.p.m. Turning this standard error into a confidence interval, however, requires the assumption of some kind of probability distribution for the deposit. For instance if we hope that the Central Limit Theorem holds, we can say that a 95% confidence interval for
T would be given by T*± 1.96se, i.e. (350 p.p.m., 450 p.p.m.). On the other hand, if we were to assume a log-normal distribution for the errors, the 95% confidence interval would be given by (354 p.p.m., 453 p.p.m.).

 

Fig. 4.2. More realistic estimation --- the value of the block is required (uranium deposit).

 

Now, let us complicate the procedure a little. Instead of estimating the value at the point A, in a more realistic situation (at least in mining) we would be interested in the average grade over an area or block or some mining unit. In Fig. 4.2, a ‘panel’ 60 ft by 30 ft has been centred on the original point A. The estimation procedure then becomes:


The same arguments as previously still hold. The average error can be shown to be zero if there is no local trend. The estimation variance is still a variogram, but it is now the variogram between the grade at sample point (1) and the average grade over the panel
A. We saw in Chapter 3 that we could cope with average grades over samples if we wanted the semi-variogram between samples of the same size, but so far we have not considered the possibility of having two different sizes to compare. The model semi-variogram supplies us with the difference in grades between two points. We could find the value of the semi-variogram between the sample point and every point within the panel A, and we could average those values. Let us define this quantity as (S,A), read as ‘gamma-bar between the sample and every point in the panel’. The ‘bar’ notation is the standard one for arithmetic mean. This gamma-bar term will take the place of the g(h) in our previous relationship. However, what we really need is the semi-variogram between the average grade of panel and the sample, not between all the individual points within the panel and the sample. 2(S,A) would be the variance of the error made if we tried to estimate every point within the panel. To correct for this difference in emphasis we need to take into account the variation of the grades at points within the panel.

This was discussed in Chapter 3, and we evaluated it using the auxiliary function F(l,b). This was the average semi-variogram between all possible pairs of points within the panel. We can rewrite this in a more general way using the gamma-bar notation. That is, (A,A) will be the average semi-variogram value between every point in the panel and every point in the panel. In the case shown in Fig. 4.2, then, when using the value at sample point (1) to estimate the average grade of the panel, the estimation variance becomes:


The calculation of these gamma-bar terms will be discussed more fully later.


Now, let us complicate the mathematics still further. We actually have more than one sample available to us, so why not use them in the estimation procedure. Suppose we use the arithmetic mean of the samples as our
T*. This gives us the simplest form of the weighted average type of estimator. That is:


In this case the term
(S,A) is the average semi-variogram value between each point in the ‘sample set’ S and each point in the panel A. The term (A,A) is still the average semi-variogram between each point in the panel and each point in the panel. However, now we have yet another source of spurious variation. We only consider the average grade of the samples as the estimator, but (S,A) takes the individual grades into account. Thus we have also to subtract a (S,S) term from the variance, where this is the average semi-variogram value between each point in the sample set and each point in the sample set (i.e. 25 ‘pairs’ of samples). The final version of the estimation variance then becomes:


The arithmetic mean is often known in Geostatistics as an extension estimator, and the above variance is referred to as the extension variance. To distinguish this variance from the more general estimation variance for a weighted average, the subscript
e is used rather than the general e.

 

CALCULATION OF GAMMA-BAR TERMS

Having produced a formula for the extension variance, it only remains to explain how to calculate such terms as
(S,A) in practice. For the sake of our (too) simplistic approach, we will consider for the moment only simple idealistic cases, and these only in one or two dimensions. Generalisation will be discussed later.

Fig.4.3. Example of using a peripheral point to estimate the average value of the line segment.

 

Consider, as an example, the setup in Fig. 4.3. There is a length of, say, drive, l m long, whose grade is unknown.

We have at our disposal a single sample, perhaps at a development heading, whose value is known. In our previous notation T is the average grade over l, T* is the grade at the sample position, A is the length and S is the single sample point. The reliability of this estimator is given by:

 


(S,S) is the semi-variogram between the sample point and itself, which is zero because the sample is a ‘point’. (A,A) is none other than the F(l) function encountered in Chapter 3. Our problem arises with (S,A) which has been defined as the average semi-variogram between the sample point and every point in the line. That is, we take M as a fixed point (the sample) and M’ can be anywhere on the line. We take all such pairs that are possible, calculate the value of the semi-variogram for each pair, sum these (using an integration), and average this sum. Because the ‘sum’ is being performed over a continuous length, we cannot divide it by the ‘number of points’ in the sum. Instead we divide by the length of the line itself, l. This produces another auxiliary function which is called c(l) and deals with the specific case of points on the end of lines. Thus our extension variance becomes:

 


It remains only to determine the function
c(l) for the particular model in use and the standard error is immediately available. The one-dimensional auxiliary functions are given below for the three ‘common’ models. Semi-variograms comprising more than one component model are easily handled. The auxiliary function for each component is evaluated and then the component auxiliary functions added together.

Auxiliary functions

Linear model for the semi-variogram;


Exponential model for the semi-variogram;

 



Spherical model for the semi-variogram:

 


Thus in our example above, if we have a linear semi-variogram, the extension variance for the setup in Fig. 4.3 becomes:

 


For any specific problem, we need specify only the length of the line
l and the slope of the semi-variogram, p.


Let us now consider a slightly more interesting example, such as that shown in Fig. 4.4.

 

Fig 4.4. Example of using a central point to estimate the average value of the line segment.

 

Here the point sample is in the middle of the line, but otherwise the situation remains the same. In:

 


only the first term
(S,A) has changed. Rather than invent a new auxiliary function, or have to do the integration all over again, we can use the existing c(l) function to produce the required term.

The term we require is as follows:


(S,A)

=  the average semi-variogram value between the sample point and every point along the line

= (sum of all semi-variogram values between the sample point and every point along the line)/l

= (sum of all the semi-variogram values between the sample point and every point in the left hand half of the line + sum of all the values between the sample and the right hand half of the line)/l

 

Fig 4.5. Simplifying the central point problem to allow the use of auxiliary functions.

 

Figure 4.5 illustrates the ‘splitting’ of the line so as to put the sample point at the end of two shorter lines. Now, c(l/2) would give us the average of all the semi-variogram values between M (the sample point) and the M’ on the left hand half of the line. Returning to the definition of the c function, it can easily be seen that the sum of all the semi-variogram values between M and M’ will be the average multiplied by the length of line under consideration.

Thus:


so that

 


In a particular case the user may substitute his own model for the semi-variogram, and hence the appropriate auxiliary functions. Before moving on let us compare this result with the previous situation, where the sample lay at the end of the line. In the former case the extension variance was:

 


By definition
c(l) must be greater than (or at least equal to) c(l/2). The conclusion? If you can only take one sample, it is better to take it in the middle of what you are trying to estimate. It is reassuring to find that so-called common sense has a sound mathematical background.

Fig. 4.6 Generalisation of the ‘central’ point problem.

 

Using the same sort of logic on Fig. 4.6, you should be able to deduce that:

 


so that

 

 

Fig. 4.7. Extrapolation of the peripheral point problem.


Figure 4.7 at first sight seems to be a different kettle of fish. However, let us follow the same procedure and see where it leads.


The point lies on the end of a ‘line’ of length
l+b. The expression (l+b) c(l+b) would give us the sum of all the semi-variogram values between the sample and the length l+b. However we do not require the points corresponding to M’ within the length b, so we may subtract those in the form c(b). That is:

 


so that

 


For the linear model, for example, this would be:

 


This is obviously larger than the expression when the point was on the end of the line, as would be expected.


One last example before we abandon one-dimensional examples: Fig. 4.8 shows the ‘same’ line, which now contains three samples.

Fig. 4.8. More complex problem when three samples are available to estimate the line segment.

 

We shall use the arithmetic mean of the three grades to estimate the length, i.e. T*=(g1+g2+g3)÷3. Then our extension variance is

 


where
S is now a set of three points. (A,A) remains unchanged, equal to F(l) since we have not changed the length to be estimated at all. However, (S,A) is now the average semi-variogram value between each of the three points and the line, so that

 


where
S1 represents sample 1 and so on. Now (S1,A) is simply c(l), as is (S3,A). The term (S2,A) is the same situation as that in Fig. 4.4, so this equals c(l/2). Thus,

 


The middle term of the variance
(S,S) requires us to take each point in the sample set with each point in the sample set. Since there are three points in the set, there are nine such pairs of points:

 


Each of the individual terms is simply the semi-variogram between a pair of points. Three of the terms,
g(S1,S1), g(S2,S2)  and g(S3,S3)  are automatically zero since the samples are points. The terms g(S1,S2), g( S2,S1), g(S2,S3)  and g(S3,S2)  are all equal to g(l/2), whilst g(S1,S3)  and g( S3,S1)  are equal to g