Tutorial
Session Two — Spatial Visualisation
The
example session with EcoSSe which is described below is intended as
an example run to familiarise the user with the package. This documented
example illustrates one possible set of analyses which may be carried out. One
of the most neglected aspects of statistical analysis --- especially of spatial
data --- is the purely visual assessment of the sample data. It takes you
through the following sequence of analyses:
Ø Post plotting the sample
data
Ø Finding the nearest
neighbour distribution for inter-sample distance
Ø Inverse distance
interpolation mapping
There
are many other facilities within the package, which are given as alternative
options on the menus. To start the tutorial, choose EcoSSe from your Start menu. See
Tutorial One for notes on starting an EcoSSe run and specifying the ghost file.

As
you can see from the above I have elected to read in a set of sample data by
clicking on the
option and selecting
from the menu which appears. EcoSSe
will remember the last five data files accessed and include these in your
options. Three input file types can be read in. I will read in a standard
Geostokos data file.
I
will select WOLFCAMP.DAT for my input data file.
This is a set of 85 samples of hydrogeological data taken from the Wolfcamp
aquifer in
The
layout of such files is described in detail in the main EcoSSe documentation. The routine
which reads in the data shows the first 10 lines of your data file so that you
can check it is going in OK. The routine also checks whether we actually had
the correct number of samples on the file and informs you if there is any
discrepancy.

Even
if you select a file from the list of previously analysed data files, EcoSSe
will ask you to confirm your choice. This is actually a quick way of getting
back to your working directory, since you can change your choice at this point.
Be warned, though, that if you change which file you want to read it must be
the same type of file – that is, if you are reading a standard Geostokos data
file, you cannot change your mind at this point and read in a CSV type file.

For
this example, we will stick with WOLFCAMP. As your data is read in,
it is stored on a working binary file. A progress bar will indicate how far the
process has gone. When data input is complete, your Window should look like the
table above.
The
routine which has been used shows the first 10 lines of your data file so that
you can check it is going in OK.
When
the data has been read in, you will see that the “greyed out” options on the
main menu bar will be activated. We use the menu bar to select an option,
say:

This
time we have chosen to display and summarise the data set in a spatial sense. A
post plot is a map showing the locations of the samples. Each sample will be
coloured and shaded according to the value of a selected variable. Since we are
analysing the wolfcamp data, choice of variables
should prove fairly simple!
The
screen will prompt you to choose the three variables for the analysis. You will
see two dialog boxes: the one in the top left hand corner lists the variables
available for analysis in your data file; the bottom right box shows the
variables already chosen (at this point, none!).
The
routine, needs to have information on
the position of the samples and on the value at each sample location. This
particular data file only contains three variables. However, EcoSSe
does not know (as yet) which of these variables is which.
There
is a lot of information on the screen. At the bottom of the Window, you see the
“status bar” which shows the name of the current data file and the title read
from that file. The “already chosen” dialog box shows you that you are expected
to select variables to be the “X (east/west) co-ordinate”, “Y (north/south)
co-ordinate” and “Measurement to be analysed” for your semi-variogram. The
upper left dialog box lists the variable names as they appeared in the data
file and is prompting you to choose the variable which will be the “X
co-ordinate” on the graph. For this example, let us choose Easting for the X
co-ordinate:
We
may then choose “Northing” for the Y co-ordinate:
|
|
|
Finally,
we must choose the variable to be analysed and state any relevant
transformations to be made. For this data we require no transformation of the
variable “Potentiometric Level”, so click on
.
The
dialog now shows the complete set of chosen
variables and has moved to the upper left corner. You have the option to change
your mind here by clicking on
.

This
choice of variables is acceptable, so click on
to proceed. This may seem tedious to you at
the moment, but (later) try running the program with another set of data with
more variables. Or try a data set where the columns are in a different order.
The EcoSSe
input routine has been written to allow you this flexibility in building
your data files.
The
software will suggest contour levels for the shaded plot.

These
may be altered as you desire. Click on
to proceed with the mapping. To plot a map,
you have to specify the area which you wish to be mapped. EcoSSe will offer a default
rectangular area which covers all of the sample locations. You may accept this
default or you may prefer a different rectangle or an irregularly shaped area.
The last choice can only be made if you have already stored the boundary of the
area as a set of vertices of a polygon. The current version of EcoSSe
can handle polygonal boundaries with up to 500 vertices.
Clicking
in the polygonal “radio button” will cause the software to ask you for a file
containing the boundary information. You may have up to 500 vertices stored on
a file. they may be stored either clockwise or anti-clockwise and the polygon
does not have to be “closed”. The default name for the boundary line file is
that of the original data file plus the extension BLN Boundary LiNe.

Accepting
this boundary returns us to the area definition dialog.

You
may respecify the minimum and maximum X and Y values at this point. The
defaults given in the dialog are the full extent of the polygonal boundary.
However, there are times when you might want to have the estimated grid points
on some regular grid starting at a standardised value. For example, changing “Minimum
Y value” to 0 would mean that the bottom left hand corner of the grid used
would be at X = 135, Y = 0. Click on
to accept this boundary definition.

If
you only wanted to see a subsection of the data (equivalent to zooming in) you
could specify a boundary for a smaller rectangle or polygon. For example:

will
read in a boundary from the file county.bln which is a small area
within the Wolfcamp study area. The Wolfcamp data set covers the

Clicking
on the
button will allow EcoSSe to plot the sample data
(see next page).
You
can copy the plot with
+
and paste it into another application. Some systems (notably Windows NT)
require pressing
+
. This will place a copy of the Window in the
clipboard. You can import the picture
into a Word processing application such as Microsoft Word, a spreadsheet
application like Lotus or Excel, or paste
+
into many applications, such as MSPaint.
The text information is also copied to the GHOST.LIS file.
Sample layout - nearest
neighbour analysis
When
analysing spatial data, one of the most important types of information we need
is the spacing between the samples. This will help us to choose search radii in
estimation routines so as to balance density of sampling against computation
time. A large search radius will ensure the inclusion of large numbers of
samples. However, if too large a radius is selected, the software will spend
more time in eliminating the excess samples than in finding the relevant ones.
The
inter-sample distance is also useful in determining the grouping intervals for
experimental semi-variogram calculation. If the sampling is extremely
irregular, it may be difficult to establish an optimum distance interval
empirically.
A
third use of “nearest neighbour” analysis is in the identification of duplicate
sampling before kriging. The kriging routines in EcoSSe assume that you wish estimation to “honour”
the sample data. This is difficult to do if you have two samples at the same
location! You should also bear in mind the computational efficiency of
micro-computers. Most PC software works with an effective precision of around 8
or 9 significant figures. If your co-ordinates are in the millions (such as in
the LO system) the computer will not be able to distinguish between samples
less than, say, half a metre apart. Whilst
EcoSSe
provides a facility for “stripping” the redundant leading digits, this may
still lead to problems.
A
routine, then, is provided for calculating and storing (on request) nearest
neighbour distances between sample locations. Remember that this type of
calculation will take exactly twice the time of a corresponding semi-variogram
analysis, since it must pair every sample up with every other sample --- both
ways. Selecting nearest neighbour analysis:

The
routine needs two co-ordinates for the sample locations. If you have not chosen
any variables before this, you will have to select “X co-ordinate” and “Y
co-ordinate” as described previously. Since we already have these variables
selected, we will be offered the choice to keep them:

Click
on
to continue with these variables. You will be
prompted for the “threshold” distance defining when samples are too close
together.

For
this illustration we have chosen the value of 0.1 mile as our criterion for
samples being too close together. If you
have also elected to store the results on a file, the default extension for
nearest neighbour files will be .NND. Progress bars will indicate how the
calculations are going. Don’t get too impatient. This is one of the most time
consuming exercises available in EcoSSe. When all the nearest neighbours have
been identified --- and stored on file --- a histogram is constructed and
displayed showing the distribution of inter-sample distances.

The
Window also gives various summary statistics. For example, the average distance
to the nearest samples is over 9 miles. However there are several pairs of
samples closer than 1 mile and some where the nearest well is over 20 miles
away.
As
an illustration, I repeated the analysis with a threshhold distance of 1
mile(!). Three samples were found to have nearest neighbours closer than one
mile. The summary histogram looks rather like the one above but with the
addition of a bar at the top of the page:
![]()
Because
there were sample pairs within the defined “too close together” distance, you
will be asked to click on the mouse to continue. Since there are “duplicates”,
a new dialog displaying the relevant information is given:

The
grid in the dialog shows which samples (by number) were giving problems and the
co-ordinates of the first of these samples. The actual distance between the two
is also given, so that you can check whether they are really duplicates or just
close together. In this case it is clear that the samples are not (in any
sense) duplicates.
Should
you really have cause for concern, you may create a new data file with these
samples eliminated from the data set. It is, obviously, preferable for you to
review your data and find out just why you have duplicated samples. If this is
a normal part of your type of data, you will have to do something about this
before you try any geostatistical estimation such as kriging. One option
available within EcoSSe is to “decluster” the data by averaging into small
rectangular cells. This option is available on the same menu as the other data
manipulation routines.
If
you request an output file with the duplicates eliminated, the software will
prompt for a name for the new file whose default extension will be .DAT. Because of this, no default name is provided. This
is in order not to overwrite your original data file by mistake. The problem
samples will still be on the new data file, but all the measurements will have
been replaced by “missing” values. This is a rather unsubtle way of making sure
that you do not disrupt your kriging system with duplicate samples.
Mapping with inverse
distance weighting
Back
to the main menu:

Interpolating
a grid of points will produce a sketch map of the sample values. This map
reflects the actual values measured at the actual sample locations and uses a
weighted average estimator for grid points which have not been sampled. Weights
are chosen as follows:
q the distance between sample
and unsampled point is calculated;
q a selected function of that distance
is calculated;
q weights are distributed
between the samples according to this distance function (total weights add up
to one).
EcoSSe will remember everything
which has been defined during this run. So far, we have defined: which
variables we have been analysing and a boundary for the area being
estimated:

The
routine also needs to know whether you want the results stored on a “grid” file:

The
default name for a grid file is the original data file name with the extension .GID. EcoSSe will suggest contour levels based on the
variability of the sample values.

You
can change these if you so desire. Alternatively you can run with the default
contours and draw prettier maps by reading the grid files back in. Please note
that “grid” files are not in the same format as “data” files. If you want to
read them back in, you must use the option:

The
software offers several alternative inverse distance weighting functions:

Click
on the relevant button to make your choice. If you choose
or
the lower boxes will activate so that you can
specify the power you require:

Once
you have selected your weighting function, you need to define search parameters
and the area which is to be studied. The neighbouring samples will be used to
produce an estimate at each unsampled grid point. Before we can go any further,
we need to define the “neighbourhood”. That is, how far do we want the software
to search for samples to be included in the estimation process.
EcoSSe cannot guess what an appropriate search
radius would be. As a simple default, we choose an area which will contain (on
average) 20 samples. This is found by simply dividing the rectangular area
around the samples by the number of samples and then multiplying this
area by 20. Finding the radius of the circle with this area produces a likely
search radius. When the value at a specified grid point is being estimated, all
samples within this circle of the point will be used in the Kriging process. If
there are too many samples within this circle, those closest to the “unsampled”
location will be selected.

In
this run, we already defined a boundary of interest to us. If you wish to
change this boundary and, say, look at the whole Wolfcamp area, simply click on
.

Once
you have chosen the area to be studied, you must define the grid spacing to be
used. Points will be calculated at each grid node and represented on the screen
as a shaded rectangle of the appropriate size.
Since
we have not previously specified a grid spacing or number of grid points, the
software defaults to 25 points in the X direction and the same grid spacing in
the Y direction. The grid does not have to be square, but the map may look a
little strange if it isn’t! We can alter the grid spacing by changing the
number in the relevant box:

If
you make a change and want to check how many grid points you have before
proceeding, click on
and the rest of the parameters will be
updated. You may also change minimum and maximum X and Y values at this stage.
Once you click on
the map parameters will be defined.

Interpolating
a grid of points produces a sketch map on the screen. The shading information for the contour
levels will appear in the left hand box and the map itself in the right. A
shaded square will be displayed on the map to show you which point is being
estimated in addition to the information in the prompt box. You may copy the
screen to your printer at any stage during the estimation process.

Finishing the Tutorial

Clicking
on this menu item or on
will end your run with the software. You will
see the closing down dialog box:

The
above Tutorial session should serve only to illustrate a possible use of the
various routines from EcoSSe. Try running the program again, choosing
your own responses. try looking at reef width instead of grade. This variable
has a standard two parameter lognormal distribution. Try reading in one of the
other data files which are provided, say, samples.dat.
General Notes
There
are a few points which you may have noted in following the Tutorial session
above. Most of the routines communicate between themselves, without you having
to worry about getting the right information from one to the other. For
example, after you read in the complete contents of the data file, the routines
ask which of the variables you actually want to analysis. This information is
then stored internally and may be accessed by any of the other routines. This
is a feature of most of EcoSSe, in that it will recall what you chose
previously and ask whether this is to change or not. You should bear this in
mind if you are analysing more than one data file in a single run. In
particular, the boundary used in mapping will be remembered. If you change data
file or even which variables you analyse this will not automatically update.
A
copy of this run should have been made on a file called GHOST.LIS unless you changed the name at the beginning
of the run. Send this file to your printer if you want a record of the analysis
or look at it with Wordpad or Notepad.
EcoSSe —
like any computer software — is not completely error-free. Neither is it
fool-proof. You can always get out of the software by pressing the
,
and
keys at the same time. This will invoke the ‘End
Task’ facility to close the Window without damaging the rest of your system. If
you cannot figure out what went wrong, note down as much information as you can
about the program you were running, the data you were using and exactly where
it broke down. Contact your supplier locally or Geostokos direct for
assistance. Send us the ghost.lis file and (if you can) the
data you were analysing at the time.