The Mclust function is as follows:
The various parameters of the Mclust function are explained in the following table:
The Mclust function uses a model when trying to decide which items belong to a cluster. There are different model names for univariate, multivariate, and single component datasets. In each, the idea is to select a model that describes the data, for example, VII will be used for data that is spherically displaced with equal volume across each cluster.
First, we must load the library that contains the mclust function (we may need to install it in the local environment) as follows:
We will be using the iris data in this example, as shown here:
Now, we can compute the best fit via EM (note capitalization of Mclust) as follows:
We can display our results as follows:
Simple display of the fit data object doesn't tell us very much, it shows just what was used to compute the density of the dataset.
The summary command presents more detailed information about the results, as listed here:
log.likelihood (-121): This is the log likelihood of the BIC valuen (149): This is the number of data pointsdf (37): This is the distributionBIC (-427): This is the Bayesian information criteria; this is an optimal valueICL (-427): Integrated Complete Data Likelihood—a classification version of the BIC. As we have the same value for ICL and BIC we classified the data points.
We can plot the results for a visual verification as follows:
You will notice that the plot command for EM produces the following four plots (as shown in the graph):
- The BIC values used for choosing the number of clusters
- A plot of the clustering
- A plot of the classification uncertainty
- The orbital plot of clusters
The following graph depicts the plot of density.
The first plot gives a depiction of the BIC ranges versus the number of components by different model names; in this case, we should probably not use VEV, for example:
This second plot shows the comparison of using each of the components of the data feed against every other component of the data feed to determine the clustering that would result. The idea is to select the components that give you the best clustering of your data. This is one of those cases where your familiarity with the data is key to selecting the appropriate data points for clustering.
In this case, I think selecting X5.1 and X1.4 yield the tightest clusters, as shown in the following graph:
.
The third plot gives another iteration of the clustering affects of the different choices highlighting the main cluster by eliminating any points from the plot that would be applied to the main cluster, as shown here:
The final, fourth plot gives an orbital view of each of the clusters giving a highlight display of where the points might appear relative to the center of each cluster, as shown here: