Re: the midpoint

Brian Gaines (
Thu, 6 May 1999 20:03:13 -0700

Chris Evans said:
>Certainly you can use "missing rating" markers and can still do
>various analyses, I can see ways in which cluster analysis could
>still be applied happily to such a matrix though I don't know of
>software that would do it effortlessly (nor should any software do it
>effortlessly as we'd stop thinking!) ... I don't
>know how REPGRIDII handles this and I'm seriously puzzled about
>how it does its PCA for odd or missing ratings.
>Brian/Mildred: would you spare a moment to explain that to us? I
>for one would really appreciate it.

Mildred had a section on the psychometrics of similarity in rep grid data
in her "On Becoming a Personal Scientist" and we have maintained that
analysis as we have got into more complex data, categorical, missing
values, any, all, none, etc.

For cluster analysis it is important to ensure that the data complies with
the underlying assumptions of the method, e.g. in principal components
analysis that the distance measure used is actually a formal distance
measure in the coordinate space (primarily the triangle inequality). We
ensure this by maintaining a distinction between data and meta-data
(indicating 'not applicable', 'any', 'all', 'none' ....). The distance
measure is such that: between data items it is the actual distance; between
data and meta-data it is the maximum of the (normalized) scale; between two
meta-data items it is determined by the meta-data semantics, by default, 0
between two identical meta-data items and maximum of scale between two
different ones.

This guarantees a distance measure and good behaviour on the part of
distance-based cluster algorithms such as principal components and FOCUS.
How well is reflects the psychological expectations depends on what these
are. For example, would you expect two elements rated 'not applicable' on
all constructs to be maximally close, or maximally distant? Whatever
expectations you have can be plugged into the distance measure above.

Note, of course, that an element with all missing values will not
contribute to the results of PCA -- which is what one would expect. One
with a few missing values will tend to be away from fully-specified data
but could become close to it in a rotation that emphasizes the
fully-specified dimensions. It will also be close to data that has the same
pattern of unspecified values and the same values in the actual values.
That is reasonable 'rough justice' psychometrics and the behaviour is, at
least, well-behaved and easily explicable.

Some nice examples of the how this technique works in practice are given in
our KAW98 paper: "WebGrid II: Developing hierarchical knowledge structures
from flat grids" <> that shows
how FOCUS perfectly reconstructs a hierarchy in some artifical data with
'not applicable' values, and in some actual hierarchical concept maps.