Clusters defined by distinct correlation structures

N.T. Longford

Abstract

In the established view, clusters are subsets (or subpopulations) that have well separated locations (centroids) in relation to their dispersions. This paper discusses a largely neglected way how meaningful subpopulations can be defined for multivariate outcomes -- the clusters may arise as having different covariance or correlation structures. Mixture modelling (of multivariate data) is well set for studying such clusters, without requiring any adaptation.

Experience from mixture analysis of income in the European Community Household Panel (ECHP) and the longitudinal part of the European Union Statistics on Income and Living Conditions (EU-SILC) will be discussed and the relation of correlations structures to patterns of annual household income, and its stability in particular, highlighted.

An improper component is included in the analysis to take care of observations with aberrant patterns of income, often with a very small income in one year. It helps to reduce the number of components and to distill the patterns in each proper component.

References:

Longford, N.T., and Pittau, G.P. (2006).
Stability of household income in the European countries in the 1990's.
Computational Statistics and Data Analysis 51, 1364-1383.

Longford, N.T., and Nicodemo, C. (2011).
A mixture analysis of income in European countries. Unpublished.

March 2011.