MclustDRsubsel.Rd
Implements a subset selection method for selecting the relevant directions spanning the dimension reduction subspace for visualizing the clustering or classification structure obtained from a finite mixture of Gaussian densities.
MclustDRsubsel(object, G = 1:9, modelNames = mclust.options("emModelNames"), ..., bic.stop = 0, bic.cutoff = 0, mindir = 1, verbose = interactive())
object  An object of class 

G  An integer vector specifying the numbers of mixture components or clusters. 
modelNames  A vector of character strings indicating the models to be fitted. See 
...  
bic.stop  A criterion to terminate the search. If maximal BIC difference is less than

bic.cutoff  A value specifying how to select simplest ``best'' model within 
mindir  An integer value specifying the minimum number of directions to be estimated. 
verbose  A logical or integer value specifying if and how much detailed information should be reported during the iterations of the algorithm.

The GMMDR method aims at reducing the dimensionality by identifying a set of linear combinations, ordered by importance as quantified by the associated eigenvalues, of the original features which capture most of the clustering or classification structure contained in the data. This is implemented in MclustDR
.
The MclustDRsubsel
function implements the greedy forward search algorithm discussed in Scrucca (2010) to prune the set of all GMMDR directions. The criterion used to select the relevant directions is based on the BIC difference between a clustering model and a model in which the feature proposal has no clustering relevance. The steps are the following:
1. Select the first feature to be the one which maximizes the BIC difference between the best clustering model and the model which assumes no clustering, i.e. a single component.
2. Select the next feature amongst those not previously included, to be the one which maximizes the BIC difference.
3. Iterate the previous step until all the BIC differences for the inclusion of a feature become less than bic.stop
.
At each step, the search over the model space is performed with respect to the model parametrisation and the number of clusters.
An object of class 'MclustDRsubsel'
which inherits from 'MclustDR'
, so it has the same components of the latter plus the following:
The basis of the estimated dimension reduction subspace expressed in terms of the original variables.
The basis of the estimated dimension reduction subspace expressed in terms of the original variables standardized to have unit standard deviation.
Scrucca, L. (2010) Dimension reduction for modelbased clustering. Statistics and Computing, 20(4), pp. 471484.
Scrucca, L. (2014) Graphical Tools for Modelbased Mixture Discriminant Analysis. Advances in Data Analysis and Classification, 8(2), pp. 147165
Luca Scrucca
if (FALSE) { # clustering data(crabs, package = "MASS") x < crabs[,4:8] class < paste(crabs$sp, crabs$sex, sep = "") mod < Mclust(x) table(class, mod$classification) dr < MclustDR(mod) summary(dr) plot(dr) drs < MclustDRsubsel(dr) summary(drs) table(class, drs$classification) plot(drs, what = "scatterplot") plot(drs, what = "pairs") plot(drs, what = "contour") plot(drs, what = "boundaries") plot(drs, what = "evalues") # classification data(banknote) da < MclustDA(banknote[,2:7], banknote$Status) table(banknote$Status, predict(da)$class) dr < MclustDR(da) summary(dr) drs < MclustDRsubsel(dr) summary(drs) table(banknote$Status, predict(drs)$class) plot(drs, what = "scatterplot") plot(drs, what = "classification") plot(drs, what = "boundaries")}