>
>Now, a little gobsmacked by Patrick's example of different grid distance
>metrics as sent in this last e-mail and not really having the wit to
>follow his correspondence with J. Maxwell in any sort of detail, I was
>nevertheless reminded that I once offered a small number of students two
>variants of a grid each had provided me with. One was cluster-analysed
>using sums of distances, and the other using sums of (distances squared).
>Having shown them how to read back a dendrogram, I asked them which felt
>more "right" as a representation of their construct structure. To a man
>(they were all male!) they preferred the sums of (distances squared)
>variant, which surprised me, since the dendrograms tended to be flatter,
>i.e., less obviously delineated clusters and in that sense, less readily
>interpretable at a first glance.
Why do people keep posting interesting comments when I am trying to write a
thousand papers for the Berlin Conference?
The distances people talk about are all part of one big (not necessarily
happy) family called the Minkowski's.
The distance between points i and j, D(i,j) is a function as follows:
m
D(i,j) = {Sum |Xip - Xjp|**n}**1/n
p=1
We take the absolute difference between factor loadings on each factor,
raise it to the power n, and sum this across the m dimensions. We then take
the n-th root of this sum. In Devi's sums-of-differences approach n=1
(known as a 'city-block' metric), in euclidean distances n=2. In most
multidimensional scaling programs you can set n = anything. But little is
known about the effect of varying n. As Devi points out, euclidean
distances suffer from flat affect. Perhaps this is why SPSS has
squared-euclidean-distance as its default.
Richard
Richard C. Bell
Associate Professor
Department of Psychology
University of Melbourne
Parkville Vic 3052 Australia
ph: +61 3 9344 6364
fax: +61 3 9347 6618
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%