Introduction toXLMiner™Data Reduction and explorationXLMiner and Microsoft Office are registered trademarks of the respective owners.
Data Exploration And ReductionData Exploration and reduction is used when the data set to be mined is very large and may contain large number of variables that are very correlated or unrelated to the outcome we are working at. Using the tools in XLMiner, one can reduce the size of the data set or explore the data set to formulate hypothesis that can be worth testing.There are two techniques for this purpose:Principle Component Analysis:The PCA is a mathematical function that is used to transform a number of correlated variables into a smaller number of uncorrelated variables. These uncorrelated variables are called Principal Components. Thus, we get a data set which has a lesser number of variables but the variability of data is maintained since the first principle component takes into consideration the maximum amount of variation in data and others after it consider slightly lesser amounts of variability into accountCluster Analysis: Cluster analysis is also called data segmentation. Its primary objective is to assign objects to the same clusters such that those within a cluster have marked similarities and those in different clusters have marked differenceshttp://dataminingtools.net
Data Exploration And Reduction- Principle Component Analysishttp://dataminingtools.net
Data Exploration And ReductionFixed #components : You can specify a fixed number here.Smallest #components explaining :  This option lets you specify a percentage, and XLMiner�will calculate the minimum number of principal components required to account for that percentage of variance. Do not select it herehttp://dataminingtools.net
Data Exploration And Reduction- Outputhttp://dataminingtools.net
Data Exploration And Reduction-Cluster AnalysisCluster analysis can be done in two ways:k-Means Clustering: - In k-means clustering, the clustering procedure begins with a single cluster that is successively split into two clusters. This continues till the required number of clusters is obtained.2.Hierarchical Cluster Analysis: - Hierarchical clustering itself can be done in two ways – agglomerative and divisive clustering. In agglomerative clustering, as the name suggests, distinct objects are combined to form a group of objects having some similarities. In divisive clustering, objects are grouped into finer groups successively. http://dataminingtools.net
Data Exploration And Reduction – K-Means ClusteringSelect the variables to be selected as input. Deselect the rows that contain Headers (Here TYPE var)http://dataminingtools.net
Data Exploration And Reduction – K-Means ClusteringEnter the number of clusters you ant the data set to be divided into and the number of iterations to be performed while creating the clusters. You may also specify number of starts and seedhttp://dataminingtools.net
Data Exploration And Reduction – K-Means Clustering (Output)XLMiner calculates the squares of the distances and chooses the least value as the Best Starting point .http://dataminingtools.net
Data Exploration And Reduction – K-Means Clustering (Output)This shows the distance of each row from the clusters. See how the rows are put into the cluster from which the a row has least distance .http://dataminingtools.net
Data Exploration And Reduction – Hierarchical clustering In hierarchical  clustering, the mean of all the values is calculated and the set is split into two from there. Then the mean for these sets is calculates and split into two .This process continues until the requires number of clusters are not formed.Hierarchical clustering itself can be done in two ways – agglomerative and divisive clustering. In agglomerative clustering, as the name suggests, distinct objects are combined to form a group of objects having some similarities. In divisive clustering, objects are grouped into finer groups successively. http://dataminingtools.net
Data Exploration And Reduction – Hierarchical Clusteringhttp://dataminingtools.net
Data Exploration And Reduction – Hierarchical ClusteringSelect “Normalize Data” and then select from any one of the five clustering procedures available.http://dataminingtools.net
Data Exploration And Reduction – Hierarchical ClusteringThis output details the history of the cluster formation.  Initially, each individual case is considered its own cluster (with just itself as a member), so we start off with # clusters = # cases (21 in the example above). At stage 1, above, clusters (i.e. cases) 10 and 13 were found to be closer together than any other two clusters (i.e. cases), so they are joined together in a cluster called Cluster 10.  So now we have one cluster that has two cases (cases 10 and 13), and 19 other clusters that still have just one case in each.  At stage 2, clusters 7 and 12 are found to be closer together than any other two clusters, so they are joined together into cluster 7.The cluster ID is thus the lowest case number of the cases belonging to that cluster. This process continues until there is just one cluster.  At various stages of the clustering process, there are different numbers of clusters.  A graph called a dendrogram lets you visualize this:http://dataminingtools.net
Data Exploration And Reduction – Hierarchical Clusteringhttp://dataminingtools.net
Data Exploration And Reduction – Hierarchical ClusteringThis shows the assignment of cases to clusters(we selected 8 clusters)http://dataminingtools.net
Thank youFor more visit:http://dataminingtools.nethttp://dataminingtools.net
Visit more self help tutorialsPick a tutorial of your choice and browse through it at your own pace.The tutorials section is free, self-guiding and will not involve any additional support.Visit us at www.dataminingtools.net

XL-MINER: Data Exploration

  • 1.
    Introduction toXLMiner™Data Reductionand explorationXLMiner and Microsoft Office are registered trademarks of the respective owners.
  • 2.
    Data Exploration AndReductionData Exploration and reduction is used when the data set to be mined is very large and may contain large number of variables that are very correlated or unrelated to the outcome we are working at. Using the tools in XLMiner, one can reduce the size of the data set or explore the data set to formulate hypothesis that can be worth testing.There are two techniques for this purpose:Principle Component Analysis:The PCA is a mathematical function that is used to transform a number of correlated variables into a smaller number of uncorrelated variables. These uncorrelated variables are called Principal Components. Thus, we get a data set which has a lesser number of variables but the variability of data is maintained since the first principle component takes into consideration the maximum amount of variation in data and others after it consider slightly lesser amounts of variability into accountCluster Analysis: Cluster analysis is also called data segmentation. Its primary objective is to assign objects to the same clusters such that those within a cluster have marked similarities and those in different clusters have marked differenceshttp://dataminingtools.net
  • 3.
    Data Exploration AndReduction- Principle Component Analysishttp://dataminingtools.net
  • 4.
    Data Exploration AndReductionFixed #components : You can specify a fixed number here.Smallest #components explaining :  This option lets you specify a percentage, and XLMiner�will calculate the minimum number of principal components required to account for that percentage of variance. Do not select it herehttp://dataminingtools.net
  • 5.
    Data Exploration AndReduction- Outputhttp://dataminingtools.net
  • 6.
    Data Exploration AndReduction-Cluster AnalysisCluster analysis can be done in two ways:k-Means Clustering: - In k-means clustering, the clustering procedure begins with a single cluster that is successively split into two clusters. This continues till the required number of clusters is obtained.2.Hierarchical Cluster Analysis: - Hierarchical clustering itself can be done in two ways – agglomerative and divisive clustering. In agglomerative clustering, as the name suggests, distinct objects are combined to form a group of objects having some similarities. In divisive clustering, objects are grouped into finer groups successively. http://dataminingtools.net
  • 7.
    Data Exploration AndReduction – K-Means ClusteringSelect the variables to be selected as input. Deselect the rows that contain Headers (Here TYPE var)http://dataminingtools.net
  • 8.
    Data Exploration AndReduction – K-Means ClusteringEnter the number of clusters you ant the data set to be divided into and the number of iterations to be performed while creating the clusters. You may also specify number of starts and seedhttp://dataminingtools.net
  • 9.
    Data Exploration AndReduction – K-Means Clustering (Output)XLMiner calculates the squares of the distances and chooses the least value as the Best Starting point .http://dataminingtools.net
  • 10.
    Data Exploration AndReduction – K-Means Clustering (Output)This shows the distance of each row from the clusters. See how the rows are put into the cluster from which the a row has least distance .http://dataminingtools.net
  • 11.
    Data Exploration AndReduction – Hierarchical clustering In hierarchical clustering, the mean of all the values is calculated and the set is split into two from there. Then the mean for these sets is calculates and split into two .This process continues until the requires number of clusters are not formed.Hierarchical clustering itself can be done in two ways – agglomerative and divisive clustering. In agglomerative clustering, as the name suggests, distinct objects are combined to form a group of objects having some similarities. In divisive clustering, objects are grouped into finer groups successively. http://dataminingtools.net
  • 12.
    Data Exploration AndReduction – Hierarchical Clusteringhttp://dataminingtools.net
  • 13.
    Data Exploration AndReduction – Hierarchical ClusteringSelect “Normalize Data” and then select from any one of the five clustering procedures available.http://dataminingtools.net
  • 14.
    Data Exploration AndReduction – Hierarchical ClusteringThis output details the history of the cluster formation.  Initially, each individual case is considered its own cluster (with just itself as a member), so we start off with # clusters = # cases (21 in the example above). At stage 1, above, clusters (i.e. cases) 10 and 13 were found to be closer together than any other two clusters (i.e. cases), so they are joined together in a cluster called Cluster 10.  So now we have one cluster that has two cases (cases 10 and 13), and 19 other clusters that still have just one case in each.  At stage 2, clusters 7 and 12 are found to be closer together than any other two clusters, so they are joined together into cluster 7.The cluster ID is thus the lowest case number of the cases belonging to that cluster. This process continues until there is just one cluster.  At various stages of the clustering process, there are different numbers of clusters.  A graph called a dendrogram lets you visualize this:http://dataminingtools.net
  • 15.
    Data Exploration AndReduction – Hierarchical Clusteringhttp://dataminingtools.net
  • 16.
    Data Exploration AndReduction – Hierarchical ClusteringThis shows the assignment of cases to clusters(we selected 8 clusters)http://dataminingtools.net
  • 17.
    Thank youFor morevisit:http://dataminingtools.nethttp://dataminingtools.net
  • 18.
    Visit more selfhelp tutorialsPick a tutorial of your choice and browse through it at your own pace.The tutorials section is free, self-guiding and will not involve any additional support.Visit us at www.dataminingtools.net