SlideShare a Scribd company logo
Visit https://ebookultra.com to download the full version and
explore more ebooks or textbooks
Data Clustering Algorithms and Applications First
Edition Charu C. Aggarwal
_____ Click the link below to download _____
https://ebookultra.com/download/data-clustering-algorithms-
and-applications-first-edition-charu-c-aggarwal/
Explore and download more ebooks or textbooks at ebookultra.com
Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Text Mining Classification Clustering and Applications 1st
Edition Ashok Srivastava
https://ebookultra.com/download/text-mining-classification-clustering-
and-applications-1st-edition-ashok-srivastava/
Growing Algorithms and Data Structures 4th Edition David
Scuse
https://ebookultra.com/download/growing-algorithms-and-data-
structures-4th-edition-david-scuse/
Image fusion algorithms and applications Tania Stathaki
https://ebookultra.com/download/image-fusion-algorithms-and-
applications-tania-stathaki/
Learning JavaScript Data Structures and Algorithms 2nd
Edition Loiane Groner
https://ebookultra.com/download/learning-javascript-data-structures-
and-algorithms-2nd-edition-loiane-groner/
Learning F Functional Data Structures and Algorithms 1st
Edition Masood
https://ebookultra.com/download/learning-f-functional-data-structures-
and-algorithms-1st-edition-masood/
Data Analytics Models and Algorithms for Intelligent Data
Analysis 1st Edition Thomas A. Runkler (Auth.)
https://ebookultra.com/download/data-analytics-models-and-algorithms-
for-intelligent-data-analysis-1st-edition-thomas-a-runkler-auth/
Concise Notes on Data Structures and Algorithms Ruby
Edition Christopher Fox
https://ebookultra.com/download/concise-notes-on-data-structures-and-
algorithms-ruby-edition-christopher-fox/
Data Structures and Algorithms in Java 4th Edition Michael
T. Goodrich
https://ebookultra.com/download/data-structures-and-algorithms-in-
java-4th-edition-michael-t-goodrich/
Data Structures and Algorithms in Java 6th Edition Michael
T. Goodrich
https://ebookultra.com/download/data-structures-and-algorithms-in-
java-6th-edition-michael-t-goodrich/
Data Clustering Algorithms and Applications First Edition Charu C. Aggarwal
Data Clustering Algorithms and Applications First
Edition Charu C. Aggarwal Digital Instant Download
Author(s): Charu C. Aggarwal, Chandan K. Reddy, (eds.)
ISBN(s): 9781315373515, 1315373513
Edition: First edition
File Details: PDF, 12.52 MB
Year: 2014
Language: english
K15510
DATA
CLUSTERING
DATA CLUSTERING
Algorithms and Applications
Aggarwal
•
Reddy
Research on the problem of clustering tends to be fragmented across the
pattern recognition, database, data mining, and machine learning communities.
Addressing this problem in a unified way, Data Clustering: Algorithms and
Applications provides complete coverage of the entire area of clustering, from
basic methods to more refined and complex data clustering approaches. It
pays special attention to recent issues in graphs, social networks, and other
domains.
The book focuses on three primary aspects of data clustering:
• Methods, describing key techniques commonly used for clustering, such
as feature selection, agglomerative clustering, partitional clustering,
density-based clustering, probabilistic clustering, grid-based clustering,
spectral clustering, and nonnegative matrix factorization
• Domains, covering methods used for different domains of data, such as
categorical data, text data, multimedia data, graph data, biological data,
stream data, uncertain data, time series clustering, high-dimensional
clustering, and big data
• Variations and Insights, discussing important variations of the clustering
process, such as semisupervised clustering, interactive clustering,
multiview clustering, cluster ensembles, and cluster validation
In this book, top researchers from around the world explore the characteristics
of clustering problems in a variety of application areas. They also explain how
to glean detailed insight from the clustering process—including how to verify
the quality of the underlying clusters—through supervision, human intervention,
or the automated generation of alternative clusters.
Data Mining
Chapman & Hall/CRC
Data Mining and Knowledge Discovery Series
Chapman & Hall/CRC
Data Mining and Knowledge Discovery Series
K15510_Cover.indd 1 7/24/13 2:46 PM
DATA
CLUSTERING
Algorithms and
Applications
© 2014 by Taylor & Francis Group, LLC
Chapman & Hall/CRC
Data Mining and Knowledge Discovery Series
PUBLISHED TITLES
SERIES EDITOR
Vipin Kumar
University of Minnesota
Department of Computer Science and Engineering
Minneapolis, Minnesota, U.S.A.
AIMS AND SCOPE
This series aims to capture new developments and applications in data mining and knowledge
discovery, while summarizing the computational tools and techniques useful in data analysis.This
series encourages the integration of mathematical, statistical, and computational methods and
techniques through the publication of a broad range of textbooks, reference works, and hand-
books. The inclusion of concrete examples and applications is highly encouraged. The scope of the
series includes, but is not limited to, titles in the areas of data mining and knowledge discovery
methods and applications, modeling, algorithms, theory and foundations, data and knowledge
visualization, data mining systems and tools, and privacy and security issues.
ADVANCES IN MACHINE LEARNING AND DATA MINING FOR ASTRONOMY
Michael J. Way, Jeffrey D. Scargle, Kamal M. Ali, and Ashok N. Srivastava
BIOLOGICAL DATA MINING
Jake Y. Chen and Stefano Lonardi
COMPUTATIONAL INTELLIGENT DATA ANALYSIS FOR SUSTAINABLE DEVELOPMENT
TingYu, NiteshV. Chawla, and Simeon Simoff
COMPUTATIONAL METHODS OF FEATURE SELECTION
Huan Liu and Hiroshi Motoda
CONSTRAINED CLUSTERING: ADVANCES IN ALGORITHMS, THEORY, AND APPLICATIONS
Sugato Basu, Ian Davidson, and Kiri L. Wagstaff
CONTRAST DATA MINING: CONCEPTS, ALGORITHMS, AND APPLICATIONS
Guozhu Dong and James Bailey
DATA CLUSTERING: ALGORITHMS AND APPLICATIONS
Charu C. Aggarawal and Chandan K. Reddy
DATA CLUSTERING IN C++: AN OBJECT-ORIENTED APPROACH
Guojun Gan
DATA MINING FOR DESIGN AND MARKETING
Yukio Ohsawa and Katsutoshi Yada
DATA MINING WITH R: LEARNING WITH CASE STUDIES
Luís Torgo
FOUNDATIONS OF PREDICTIVE ANALYTICS
James Wu and Stephen Coggeshall
GEOGRAPHIC DATA MINING AND KNOWLEDGE DISCOVERY, SECOND EDITION
Harvey J. Miller and Jiawei Han
HANDBOOK OF EDUCATIONAL DATA MINING
Cristóbal Romero, Sebastian Ventura, Mykola Pechenizkiy, and Ryan S.J.d. Baker
© 2014 by Taylor & Francis Group, LLC
INFORMATION DISCOVERY ON ELECTRONIC HEALTH RECORDS
Vagelis Hristidis
INTELLIGENT TECHNOLOGIES FOR WEB APPLICATIONS
Priti Srinivas Sajja and Rajendra Akerkar
INTRODUCTION TO PRIVACY-PRESERVING DATA PUBLISHING:
CONCEPTS AND TECHNIQUES
Benjamin C. M. Fung, Ke Wang, Ada Wai-Chee Fu, and Philip S. Yu
KNOWLEDGE DISCOVERY FOR COUNTERTERRORISM AND LAW ENFORCEMENT
David Skillicorn
KNOWLEDGE DISCOVERY FROM DATA STREAMS
João Gama
MACHINE LEARNING AND KNOWLEDGE DISCOVERY FOR
ENGINEERING SYSTEMS HEALTH MANAGEMENT
Ashok N. Srivastava and Jiawei Han
MINING SOFTWARE SPECIFICATIONS: METHODOLOGIES AND APPLICATIONS
DavidLo,Siau-ChengKhoo,JiaweiHan,andChaoLiu
MULTIMEDIA DATA MINING: A SYSTEMATIC INTRODUCTION TO CONCEPTS AND THEORY
Zhongfei Zhang and Ruofei Zhang
MUSIC DATA MINING
Tao Li, Mitsunori Ogihara, and George Tzanetakis
NEXT GENERATION OF DATA MINING
Hillol Kargupta, Jiawei Han, Philip S. Yu, Rajeev Motwani, and Vipin Kumar
PRACTICAL GRAPH MINING WITH R
Nagiza F. Samatova, William Hendrix, John Jenkins, Kanchana Padmanabhan,
and Arpan Chakraborty
RELATIONAL DATA CLUSTERING: MODELS, ALGORITHMS, AND APPLICATIONS
Bo Long, Zhongfei Zhang, and Philip S. Yu
SERVICE-ORIENTED DISTRIBUTED KNOWLEDGE DISCOVERY
Domenico Talia and Paolo Trunfio
SPECTRAL FEATURE SELECTION FOR DATA MINING
Zheng Alan Zhao and Huan Liu
STATISTICAL DATA MINING USING SAS APPLICATIONS, SECOND EDITION
George Fernandez
SUPPORTVECTOR MACHINES: OPTIMIZATION BASED THEORY, ALGORITHMS,
AND EXTENSIONS
Naiyang Deng, Yingjie Tian, and Chunhua Zhang
TEMPORAL DATA MINING
Theophano Mitsa
TEXT MINING: CLASSIFICATION, CLUSTERING, AND APPLICATIONS
Ashok N. Srivastava and Mehran Sahami
THE TOP TEN ALGORITHMS IN DATA MINING
Xindong Wu and Vipin Kumar
UNDERSTANDING COMPLEX DATASETS:
DATA MINING WITH MATRIX DECOMPOSITIONS
David Skillicorn
© 2014 by Taylor & Francis Group, LLC
© 2014 by Taylor & Francis Group, LLC
DATA
CLUSTERING
Algorithms and
Applications
Edited by
Charu C. Aggarwal
Chandan K. Reddy
© 2014 by Taylor & Francis Group, LLC
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2014 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works
Version Date: 20130508
International Standard Book Number-13: 978-1-4665-5822-9 (eBook - PDF)
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been
made to publish reliable data and information, but the author and publisher cannot assume responsibility for the valid-
ity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright
holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this
form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may
rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or uti-
lized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopy-
ing, microfilming, and recording, or in any information storage or retrieval system, without written permission from the
publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://
www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923,
978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For
organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
© 2014 by Taylor & Francis Group, LLC
Contents
Preface xxi
Editor Biographies xxiii
Contributors xxv
1 An Introduction to Cluster Analysis 1
Charu C. Aggarwal
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Common Techniques Used in Cluster Analysis . . . . . . . . . . . . . . . . . . 3
1.2.1 Feature Selection Methods . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Probabilistic and Generative Models . . . . . . . . . . . . . . . . . . . 4
1.2.3 Distance-Based Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.4 Density- and Grid-Based Methods . . . . . . . . . . . . . . . . . . . . . 7
1.2.5 Leveraging Dimensionality Reduction Methods . . . . . . . . . . . . . 8
1.2.5.1 Generative Models for Dimensionality Reduction . . . . . . . 8
1.2.5.2 Matrix Factorization and Co-Clustering . . . . . . . . . . . . 8
1.2.5.3 Spectral Methods . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.6 The High Dimensional Scenario . . . . . . . . . . . . . . . . . . . . . . 11
1.2.7 Scalable Techniques for Cluster Analysis . . . . . . . . . . . . . . . . . 13
1.2.7.1 I/O Issues in Database Management . . . . . . . . . . . . . . 13
1.2.7.2 Streaming Algorithms . . . . . . . . . . . . . . . . . . . . . 14
1.2.7.3 The Big Data Framework . . . . . . . . . . . . . . . . . . . . 14
1.3 Data Types Studied in Cluster Analysis . . . . . . . . . . . . . . . . . . . . . . 15
1.3.1 Clustering Categorical Data . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.2 Clustering Text Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3.3 Clustering Multimedia Data . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3.4 Clustering Time-Series Data . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3.5 Clustering Discrete Sequences . . . . . . . . . . . . . . . . . . . . . . . 17
1.3.6 Clustering Network Data . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3.7 Clustering Uncertain Data . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4 Insights Gained from Different Variations of Cluster Analysis . . . . . . . . . . . 19
1.4.1 Visual Insights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4.2 Supervised Insights . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4.3 Multiview and Ensemble-Based Insights . . . . . . . . . . . . . . . . . 21
1.4.4 Validation-Based Insights . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.5 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
vii
© 2014 by Taylor & Francis Group, LLC
viii Contents
2 Feature Selection for Clustering: A Review 29
Salem Alelyani, Jiliang Tang, and Huan Liu
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.1.1 Data Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.1.2 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.1.3 Feature Selection for Clustering . . . . . . . . . . . . . . . . . . . . . . 33
2.1.3.1 Filter Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.1.3.2 Wrapper Model . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.1.3.3 Hybrid Model . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.2 Feature Selection for Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.2.1 Algorithms for Generic Data . . . . . . . . . . . . . . . . . . . . . . . 36
2.2.1.1 Spectral Feature Selection (SPEC) . . . . . . . . . . . . . . . 36
2.2.1.2 Laplacian Score (LS) . . . . . . . . . . . . . . . . . . . . . . 36
2.2.1.3 Feature Selection for Sparse Clustering . . . . . . . . . . . . 37
2.2.1.4 Localized Feature Selection Based on Scatter Separability
(LFSBSS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.2.1.5 Multicluster Feature Selection (MCFS) . . . . . . . . . . . . 39
2.2.1.6 Feature Weighting k-Means . . . . . . . . . . . . . . . . . . . 40
2.2.2 Algorithms for Text Data . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.2.2.1 Term Frequency (TF) . . . . . . . . . . . . . . . . . . . . . . 41
2.2.2.2 Inverse Document Frequency (IDF) . . . . . . . . . . . . . . 42
2.2.2.3 Term Frequency-Inverse Document Frequency (TF-IDF) . . . 42
2.2.2.4 Chi Square Statistic . . . . . . . . . . . . . . . . . . . . . . . 42
2.2.2.5 Frequent Term-Based Text Clustering . . . . . . . . . . . . . 44
2.2.2.6 Frequent Term Sequence . . . . . . . . . . . . . . . . . . . . 45
2.2.3 Algorithms for Streaming Data . . . . . . . . . . . . . . . . . . . . . . 47
2.2.3.1 Text Stream Clustering Based on Adaptive Feature Selection
(TSC-AFS) . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.2.3.2 High-Dimensional Projected Stream Clustering (HPStream) . 48
2.2.4 Algorithms for Linked Data . . . . . . . . . . . . . . . . . . . . . . . . 50
2.2.4.1 Challenges and Opportunities . . . . . . . . . . . . . . . . . . 50
2.2.4.2 LUFS: An Unsupervised Feature Selection Framework for
Linked Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.2.4.3 Conclusion and Future Work for Linked Data . . . . . . . . . 52
2.3 Discussions and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.3.1 The Chicken or the Egg Dilemma . . . . . . . . . . . . . . . . . . . . . 53
2.3.2 Model Selection: K and l . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.3.3 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.3.4 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3 Probabilistic Models for Clustering 61
Hongbo Deng and Jiawei Han
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.2 Mixture Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.2.2 Gaussian Mixture Model . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.2.3 Bernoulli Mixture Model . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.2.4 Model Selection Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.3 EM Algorithm and Its Variations . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.3.1 The General EM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 69
3.3.2 Mixture Models Revisited . . . . . . . . . . . . . . . . . . . . . . . . . 73
© 2014 by Taylor & Francis Group, LLC
Contents ix
3.3.3 Limitations of the EM Algorithm . . . . . . . . . . . . . . . . . . . . . 75
3.3.4 Applications of the EM Algorithm . . . . . . . . . . . . . . . . . . . . 76
3.4 Probabilistic Topic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.4.1 Probabilistic Latent Semantic Analysis . . . . . . . . . . . . . . . . . . 77
3.4.2 Latent Dirichlet Allocation . . . . . . . . . . . . . . . . . . . . . . . . 79
3.4.3 Variations and Extensions . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.5 Conclusions and Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4 A Survey of Partitional and Hierarchical Clustering Algorithms 87
Chandan K. Reddy and Bhanukiran Vinzamuri
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.2 Partitional Clustering Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.2.1 K-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.2.2 Minimization of Sum of Squared Errors . . . . . . . . . . . . . . . . . . 90
4.2.3 Factors Affecting K-Means . . . . . . . . . . . . . . . . . . . . . . . . 91
4.2.3.1 Popular Initialization Methods . . . . . . . . . . . . . . . . . 91
4.2.3.2 Estimating the Number of Clusters . . . . . . . . . . . . . . . 92
4.2.4 Variations of K-Means . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.2.4.1 K-Medoids Clustering . . . . . . . . . . . . . . . . . . . . . 93
4.2.4.2 K-Medians Clustering . . . . . . . . . . . . . . . . . . . . . 94
4.2.4.3 K-Modes Clustering . . . . . . . . . . . . . . . . . . . . . . 94
4.2.4.4 Fuzzy K-Means Clustering . . . . . . . . . . . . . . . . . . . 95
4.2.4.5 X-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . 95
4.2.4.6 Intelligent K-Means Clustering . . . . . . . . . . . . . . . . . 96
4.2.4.7 Bisecting K-Means Clustering . . . . . . . . . . . . . . . . . 97
4.2.4.8 Kernel K-Means Clustering . . . . . . . . . . . . . . . . . . . 97
4.2.4.9 Mean Shift Clustering . . . . . . . . . . . . . . . . . . . . . . 98
4.2.4.10 Weighted K-Means Clustering . . . . . . . . . . . . . . . . . 98
4.2.4.11 Genetic K-Means Clustering . . . . . . . . . . . . . . . . . . 99
4.2.5 Making K-Means Faster . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.3 Hierarchical Clustering Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.3.1 Agglomerative Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.3.1.1 Single and Complete Link . . . . . . . . . . . . . . . . . . . 101
4.3.1.2 Group Averaged and Centroid Agglomerative Clustering . . . 102
4.3.1.3 Ward’s Criterion . . . . . . . . . . . . . . . . . . . . . . . . 103
4.3.1.4 Agglomerative Hierarchical Clustering Algorithm . . . . . . . 103
4.3.1.5 Lance–Williams Dissimilarity Update Formula . . . . . . . . 103
4.3.2 Divisive Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.3.2.1 Issues in Divisive Clustering . . . . . . . . . . . . . . . . . . 104
4.3.2.2 Divisive Hierarchical Clustering Algorithm . . . . . . . . . . 105
4.3.2.3 Minimum Spanning Tree-Based Clustering . . . . . . . . . . 105
4.3.3 Other Hierarchical Clustering Algorithms . . . . . . . . . . . . . . . . . 106
4.4 Discussion and Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5 Density-Based Clustering 111
Martin Ester
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.2 DBSCAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.3 DENCLUE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.4 OPTICS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.5 Other Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
© 2014 by Taylor & Francis Group, LLC
x Contents
5.6 Subspace Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.7 Clustering Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.8 Other Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6 Grid-Based Clustering 127
Wei Cheng, Wei Wang, and Sandra Batista
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.2 The Classical Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
6.2.1 Earliest Approaches: GRIDCLUS and BANG . . . . . . . . . . . . . . 131
6.2.2 STING and STING+: The Statistical Information Grid Approach . . . . 132
6.2.3 WaveCluster: Wavelets in Grid-Based Clustering . . . . . . . . . . . . . 134
6.3 Adaptive Grid-Based Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.3.1 AMR: Adaptive Mesh Refinement Clustering . . . . . . . . . . . . . . . 135
6.4 Axis-Shifting Grid-Based Algorithms . . . . . . . . . . . . . . . . . . . . . . . 136
6.4.1 NSGC: New Shifting Grid Clustering Algorithm . . . . . . . . . . . . . 136
6.4.2 ADCC: Adaptable Deflect and Conquer Clustering . . . . . . . . . . . . 137
6.4.3 ASGC: Axis-Shifted Grid-Clustering . . . . . . . . . . . . . . . . . . . 137
6.4.4 GDILC: Grid-Based Density-IsoLine Clustering Algorithm . . . . . . . 138
6.5 High-Dimensional Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.5.1 CLIQUE: The Classical High-Dimensional Algorithm . . . . . . . . . . 139
6.5.2 Variants of CLIQUE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6.5.2.1 ENCLUS: Entropy-Based Approach . . . . . . . . . . . . . . 140
6.5.2.2 MAFIA: Adaptive Grids in High Dimensions . . . . . . . . . 141
6.5.3 OptiGrid: Density-Based Optimal Grid Partitioning . . . . . . . . . . . 141
6.5.4 Variants of the OptiGrid Approach . . . . . . . . . . . . . . . . . . . . 143
6.5.4.1 O-Cluster: A Scalable Approach . . . . . . . . . . . . . . . . 143
6.5.4.2 CBF: Cell-Based Filtering . . . . . . . . . . . . . . . . . . . 144
6.6 Conclusions and Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
7 Nonnegative Matrix Factorizations for Clustering: A Survey 149
Tao Li and Chris Ding
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
7.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
7.1.2 NMF Formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
7.2 NMF for Clustering: Theoretical Foundations . . . . . . . . . . . . . . . . . . . 151
7.2.1 NMF and K-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . 151
7.2.2 NMF and Probabilistic Latent Semantic Indexing . . . . . . . . . . . . . 152
7.2.3 NMF and Kernel K-Means and Spectral Clustering . . . . . . . . . . . . 152
7.2.4 NMF Boundedness Theorem . . . . . . . . . . . . . . . . . . . . . . . 153
7.3 NMF Clustering Capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
7.3.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
7.3.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
7.4 NMF Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.4.2 Algorithm Development . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.4.3 Practical Issues in NMF Algorithms . . . . . . . . . . . . . . . . . . . . 156
7.4.3.1 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
7.4.3.2 Stopping Criteria . . . . . . . . . . . . . . . . . . . . . . . . 156
7.4.3.3 Objective Function vs. Clustering Performance . . . . . . . . 157
7.4.3.4 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
© 2014 by Taylor & Francis Group, LLC
Contents xi
7.5 NMF Related Factorizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
7.6 NMF for Clustering: Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.6.1 Co-Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.6.2 Semisupervised Clustering . . . . . . . . . . . . . . . . . . . . . . . . 162
7.6.3 Semisupervised Co-Clustering . . . . . . . . . . . . . . . . . . . . . . 162
7.6.4 Consensus Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
7.6.5 Graph Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
7.6.6 Other Clustering Extensions . . . . . . . . . . . . . . . . . . . . . . . . 164
7.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
8 Spectral Clustering 177
Jialu Liu and Jiawei Han
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
8.2 Similarity Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
8.3 Unnormalized Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 180
8.3.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
8.3.2 Unnormalized Graph Laplacian . . . . . . . . . . . . . . . . . . . . . . 180
8.3.3 Spectrum Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
8.3.4 Unnormalized Spectral Clustering Algorithm . . . . . . . . . . . . . . . 182
8.4 Normalized Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
8.4.1 Normalized Graph Laplacian . . . . . . . . . . . . . . . . . . . . . . . 183
8.4.2 Spectrum Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
8.4.3 Normalized Spectral Clustering Algorithm . . . . . . . . . . . . . . . . 184
8.5 Graph Cut View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
8.5.1 Ratio Cut Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
8.5.2 Normalized Cut Relaxation . . . . . . . . . . . . . . . . . . . . . . . . 187
8.6 Random Walks View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
8.7 Connection to Laplacian Eigenmap . . . . . . . . . . . . . . . . . . . . . . . . . 189
8.8 Connection to Kernel k-Means and Nonnegative Matrix Factorization . . . . . . 191
8.9 Large Scale Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
8.10 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
9 Clustering High-Dimensional Data 201
Arthur Zimek
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
9.2 The “Curse of Dimensionality” . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
9.2.1 Different Aspects of the “Curse” . . . . . . . . . . . . . . . . . . . . . 202
9.2.2 Consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
9.3 Clustering Tasks in Subspaces of High-Dimensional Data . . . . . . . . . . . . . 206
9.3.1 Categories of Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . 206
9.3.1.1 Axis-Parallel Subspaces . . . . . . . . . . . . . . . . . . . . 206
9.3.1.2 Arbitrarily Oriented Subspaces . . . . . . . . . . . . . . . . . 207
9.3.1.3 Special Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 207
9.3.2 Search Spaces for the Clustering Problem . . . . . . . . . . . . . . . . . 207
9.4 Fundamental Algorithmic Ideas . . . . . . . . . . . . . . . . . . . . . . . . . . 208
9.4.1 Clustering in Axis-Parallel Subspaces . . . . . . . . . . . . . . . . . . . 208
9.4.1.1 Cluster Model . . . . . . . . . . . . . . . . . . . . . . . . . . 208
9.4.1.2 Basic Techniques . . . . . . . . . . . . . . . . . . . . . . . . 208
9.4.1.3 Clustering Algorithms . . . . . . . . . . . . . . . . . . . . . 210
9.4.2 Clustering in Arbitrarily Oriented Subspaces . . . . . . . . . . . . . . . 215
9.4.2.1 Cluster Model . . . . . . . . . . . . . . . . . . . . . . . . . . 215
© 2014 by Taylor & Francis Group, LLC
xii Contents
9.4.2.2 Basic Techniques and Example Algorithms . . . . . . . . . . 216
9.5 Open Questions and Current Research Directions . . . . . . . . . . . . . . . . . 218
9.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
10 A Survey of Stream Clustering Algorithms 231
Charu C. Aggarwal
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
10.2 Methods Based on Partitioning Representatives . . . . . . . . . . . . . . . . . . 233
10.2.1 The STREAM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 233
10.2.2 CluStream: The Microclustering Framework . . . . . . . . . . . . . . . 235
10.2.2.1 Microcluster Definition . . . . . . . . . . . . . . . . . . . . . 235
10.2.2.2 Pyramidal Time Frame . . . . . . . . . . . . . . . . . . . . . 236
10.2.2.3 Online Clustering with CluStream . . . . . . . . . . . . . . . 237
10.3 Density-Based Stream Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 239
10.3.1 DenStream: Density-Based Microclustering . . . . . . . . . . . . . . . 240
10.3.2 Grid-Based Streaming Algorithms . . . . . . . . . . . . . . . . . . . . 241
10.3.2.1 D-Stream Algorithm . . . . . . . . . . . . . . . . . . . . . . 241
10.3.2.2 Other Grid-Based Algorithms . . . . . . . . . . . . . . . . . 242
10.4 Probabilistic Streaming Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 243
10.5 Clustering High-Dimensional Streams . . . . . . . . . . . . . . . . . . . . . . . 243
10.5.1 The HPSTREAM Method . . . . . . . . . . . . . . . . . . . . . . . . . 244
10.5.2 Other High-Dimensional Streaming Algorithms . . . . . . . . . . . . . 244
10.6 Clustering Discrete and Categorical Streams . . . . . . . . . . . . . . . . . . . . 245
10.6.1 Clustering Binary Data Streams with k-Means . . . . . . . . . . . . . . 245
10.6.2 The StreamCluCD Algorithm . . . . . . . . . . . . . . . . . . . . . . . 245
10.6.3 Massive-Domain Clustering . . . . . . . . . . . . . . . . . . . . . . . . 246
10.7 Text Stream Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
10.8 Other Scenarios for Stream Clustering . . . . . . . . . . . . . . . . . . . . . . . 252
10.8.1 Clustering Uncertain Data Streams . . . . . . . . . . . . . . . . . . . . 253
10.8.2 Clustering Graph Streams . . . . . . . . . . . . . . . . . . . . . . . . . 253
10.8.3 Distributed Clustering of Data Streams . . . . . . . . . . . . . . . . . . 254
10.9 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
11 Big Data Clustering 259
Hanghang Tong and U Kang
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
11.2 One-Pass Clustering Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 260
11.2.1 CLARANS: Fighting with Exponential Search Space . . . . . . . . . . 260
11.2.2 BIRCH: Fighting with Limited Memory . . . . . . . . . . . . . . . . . 261
11.2.3 CURE: Fighting with the Irregular Clusters . . . . . . . . . . . . . . . . 263
11.3 Randomized Techniques for Clustering Algorithms . . . . . . . . . . . . . . . . 263
11.3.1 Locality-Preserving Projection . . . . . . . . . . . . . . . . . . . . . . 264
11.3.2 Global Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
11.4 Parallel and Distributed Clustering Algorithms . . . . . . . . . . . . . . . . . . . 268
11.4.1 General Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
11.4.2 DBDC: Density-Based Clustering . . . . . . . . . . . . . . . . . . . . . 269
11.4.3 ParMETIS: Graph Partitioning . . . . . . . . . . . . . . . . . . . . . . 269
11.4.4 PKMeans: K-Means with MapReduce . . . . . . . . . . . . . . . . . . 270
11.4.5 DisCo: Co-Clustering with MapReduce . . . . . . . . . . . . . . . . . . 271
11.4.6 BoW: Subspace Clustering with MapReduce . . . . . . . . . . . . . . . 272
11.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
© 2014 by Taylor & Francis Group, LLC
Contents xiii
12 Clustering Categorical Data 277
Bill Andreopoulos
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
12.2 Goals of Categorical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
12.2.1 Clustering Road Map . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
12.3 Similarity Measures for Categorical Data . . . . . . . . . . . . . . . . . . . . . 282
12.3.1 The Hamming Distance in Categorical and Binary Data . . . . . . . . . 282
12.3.2 Probabilistic Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
12.3.3 Information-Theoretic Measures . . . . . . . . . . . . . . . . . . . . . 283
12.3.4 Context-Based Similarity Measures . . . . . . . . . . . . . . . . . . . . 284
12.4 Descriptions of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
12.4.1 Partition-Based Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 284
12.4.1.1 k-Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
12.4.1.2 k-Prototypes (Mixed Categorical and Numerical) . . . . . . . 285
12.4.1.3 Fuzzy k-Modes . . . . . . . . . . . . . . . . . . . . . . . . . 286
12.4.1.4 Squeezer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
12.4.1.5 COOLCAT . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
12.4.2 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 287
12.4.2.1 ROCK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
12.4.2.2 COBWEB . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
12.4.2.3 LIMBO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
12.4.3 Density-Based Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 289
12.4.3.1 Projected (Subspace) Clustering . . . . . . . . . . . . . . . . 290
12.4.3.2 CACTUS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
12.4.3.3 CLICKS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
12.4.3.4 STIRR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
12.4.3.5 CLOPE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
12.4.3.6 HIERDENC: Hierarchical Density-Based Clustering . . . . . 292
12.4.3.7 MULIC: Multiple Layer Incremental Clustering . . . . . . . . 293
12.4.4 Model-Based Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 296
12.4.4.1 BILCOM Empirical Bayesian (Mixed Categorical and Numer-
ical) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
12.4.4.2 AutoClass (Mixed Categorical and Numerical) . . . . . . . . 296
12.4.4.3 SVM Clustering (Mixed Categorical and Numerical) . . . . . 297
12.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
13 Document Clustering: The Next Frontier 305
David C. Anastasiu, Andrea Tagarelli, and George Karypis
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
13.2 Modeling a Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
13.2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
13.2.2 The Vector Space Model . . . . . . . . . . . . . . . . . . . . . . . . . . 307
13.2.3 Alternate Document Models . . . . . . . . . . . . . . . . . . . . . . . . 309
13.2.4 Dimensionality Reduction for Text . . . . . . . . . . . . . . . . . . . . 309
13.2.5 Characterizing Extremes . . . . . . . . . . . . . . . . . . . . . . . . . . 310
13.3 General Purpose Document Clustering . . . . . . . . . . . . . . . . . . . . . . . 311
13.3.1 Similarity/Dissimilarity-Based Algorithms . . . . . . . . . . . . . . . . 311
13.3.2 Density-Based Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 312
13.3.3 Adjacency-Based Algorithms . . . . . . . . . . . . . . . . . . . . . . . 313
13.3.4 Generative Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
13.4 Clustering Long Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
© 2014 by Taylor & Francis Group, LLC
xiv Contents
13.4.1 Document Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 315
13.4.2 Clustering Segmented Documents . . . . . . . . . . . . . . . . . . . . . 317
13.4.3 Simultaneous Segment Identification and Clustering . . . . . . . . . . . 321
13.5 Clustering Short Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
13.5.1 General Methods for Short Document Clustering . . . . . . . . . . . . . 323
13.5.2 Clustering with Knowledge Infusion . . . . . . . . . . . . . . . . . . . 324
13.5.3 Clustering Web Snippets . . . . . . . . . . . . . . . . . . . . . . . . . . 325
13.5.4 Clustering Microblogs . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
13.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
14 Clustering Multimedia Data 339
Shen-Fu Tsai, Guo-Jun Qi, Shiyu Chang, Min-Hsuan Tsai, and Thomas S. Huang
14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
14.2 Clustering with Image Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
14.2.1 Visual Words Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
14.2.2 Face Clustering and Annotation . . . . . . . . . . . . . . . . . . . . . . 342
14.2.3 Photo Album Event Recognition . . . . . . . . . . . . . . . . . . . . . 343
14.2.4 Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
14.2.5 Large-Scale Image Classification . . . . . . . . . . . . . . . . . . . . . 345
14.3 Clustering with Video and Audio Data . . . . . . . . . . . . . . . . . . . . . . . 347
14.3.1 Video Summarization . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
14.3.2 Video Event Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
14.3.3 Video Story Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
14.3.4 Music Summarization . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
14.4 Clustering with Multimodal Data . . . . . . . . . . . . . . . . . . . . . . . . . . 351
14.5 Summary and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . 353
15 Time-Series Data Clustering 357
Dimitrios Kotsakos, Goce Trajcevski, Dimitrios Gunopulos, and Charu C.
Aggarwal
15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
15.2 The Diverse Formulations for Time-Series Clustering . . . . . . . . . . . . . . . 359
15.3 Online Correlation-Based Clustering . . . . . . . . . . . . . . . . . . . . . . . . 360
15.3.1 Selective Muscles and Related Methods . . . . . . . . . . . . . . . . . . 361
15.3.2 Sensor Selection Algorithms for Correlation Clustering . . . . . . . . . 362
15.4 Similarity and Distance Measures . . . . . . . . . . . . . . . . . . . . . . . . . 363
15.4.1 Univariate Distance Measures . . . . . . . . . . . . . . . . . . . . . . . 363
15.4.1.1 Lp Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
15.4.1.2 Dynamic Time Warping Distance . . . . . . . . . . . . . . . 364
15.4.1.3 EDIT Distance . . . . . . . . . . . . . . . . . . . . . . . . . 365
15.4.1.4 Longest Common Subsequence . . . . . . . . . . . . . . . . 365
15.4.2 Multivariate Distance Measures . . . . . . . . . . . . . . . . . . . . . . 366
15.4.2.1 Multidimensional Lp Distance . . . . . . . . . . . . . . . . . 366
15.4.2.2 Multidimensional DTW . . . . . . . . . . . . . . . . . . . . . 367
15.4.2.3 Multidimensional LCSS . . . . . . . . . . . . . . . . . . . . 368
15.4.2.4 Multidimensional Edit Distance . . . . . . . . . . . . . . . . 368
15.4.2.5 Multidimensional Subsequence Matching . . . . . . . . . . . 368
15.5 Shape-Based Time-Series Clustering Techniques . . . . . . . . . . . . . . . . . 369
15.5.1 k-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
15.5.2 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 371
15.5.3 Density-Based Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 372
© 2014 by Taylor & Francis Group, LLC
Contents xv
15.5.4 Trajectory Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
15.6 Time-Series Clustering Applications . . . . . . . . . . . . . . . . . . . . . . . . 374
15.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
16 Clustering Biological Data 381
Chandan K. Reddy, Mohammad Al Hasan, and Mohammed J. Zaki
16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
16.2 Clustering Microarray Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
16.2.1 Proximity Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
16.2.2 Categorization of Algorithms . . . . . . . . . . . . . . . . . . . . . . . 384
16.2.3 Standard Clustering Algorithms . . . . . . . . . . . . . . . . . . . . . . 385
16.2.3.1 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . 385
16.2.3.2 Probabilistic Clustering . . . . . . . . . . . . . . . . . . . . . 386
16.2.3.3 Graph-Theoretic Clustering . . . . . . . . . . . . . . . . . . . 386
16.2.3.4 Self-Organizing Maps . . . . . . . . . . . . . . . . . . . . . . 387
16.2.3.5 Other Clustering Methods . . . . . . . . . . . . . . . . . . . 387
16.2.4 Biclustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388
16.2.4.1 Types and Structures of Biclusters . . . . . . . . . . . . . . . 389
16.2.4.2 Biclustering Algorithms . . . . . . . . . . . . . . . . . . . . 390
16.2.4.3 Recent Developments . . . . . . . . . . . . . . . . . . . . . . 391
16.2.5 Triclustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
16.2.6 Time-Series Gene Expression Data Clustering . . . . . . . . . . . . . . 392
16.2.7 Cluster Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
16.3 Clustering Biological Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 394
16.3.1 Characteristics of PPI Network Data . . . . . . . . . . . . . . . . . . . 394
16.3.2 Network Clustering Algorithms . . . . . . . . . . . . . . . . . . . . . . 394
16.3.2.1 Molecular Complex Detection . . . . . . . . . . . . . . . . . 394
16.3.2.2 Markov Clustering . . . . . . . . . . . . . . . . . . . . . . . 395
16.3.2.3 Neighborhood Search Methods . . . . . . . . . . . . . . . . . 395
16.3.2.4 Clique Percolation Method . . . . . . . . . . . . . . . . . . . 395
16.3.2.5 Ensemble Clustering . . . . . . . . . . . . . . . . . . . . . . 396
16.3.2.6 Other Clustering Methods . . . . . . . . . . . . . . . . . . . 396
16.3.3 Cluster Validation and Challenges . . . . . . . . . . . . . . . . . . . . . 397
16.4 Biological Sequence Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
16.4.1 Sequence Similarity Metrics . . . . . . . . . . . . . . . . . . . . . . . . 397
16.4.1.1 Alignment-Based Similarity . . . . . . . . . . . . . . . . . . 398
16.4.1.2 Keyword-Based Similarity . . . . . . . . . . . . . . . . . . . 398
16.4.1.3 Kernel-Based Similarity . . . . . . . . . . . . . . . . . . . . 399
16.4.1.4 Model-Based Similarity . . . . . . . . . . . . . . . . . . . . . 399
16.4.2 Sequence Clustering Algorithms . . . . . . . . . . . . . . . . . . . . . 399
16.4.2.1 Subsequence-Based Clustering . . . . . . . . . . . . . . . . . 399
16.4.2.2 Graph-Based Clustering . . . . . . . . . . . . . . . . . . . . 400
16.4.2.3 Probabilistic Models . . . . . . . . . . . . . . . . . . . . . . 402
16.4.2.4 Suffix Tree and Suffix Array-Based Method . . . . . . . . . . 403
16.5 Software Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
16.6 Discussion and Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
© 2014 by Taylor & Francis Group, LLC
xvi Contents
17 Network Clustering 415
Srinivasan Parthasarathy and S M Faisal
17.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416
17.2 Background and Nomenclature . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
17.3 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
17.4 Common Evaluation Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
17.5 Partitioning with Geometric Information . . . . . . . . . . . . . . . . . . . . . . 419
17.5.1 Coordinate Bisection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
17.5.2 Inertial Bisection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
17.5.3 Geometric Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . 420
17.6 Graph Growing and Greedy Algorithms . . . . . . . . . . . . . . . . . . . . . . 421
17.6.1 Kernighan-Lin Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 422
17.7 Agglomerative and Divisive Clustering . . . . . . . . . . . . . . . . . . . . . . . 423
17.8 Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
17.8.1 Similarity Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
17.8.2 Types of Similarity Graphs . . . . . . . . . . . . . . . . . . . . . . . . 425
17.8.3 Graph Laplacians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
17.8.3.1 Unnormalized Graph Laplacian . . . . . . . . . . . . . . . . 426
17.8.3.2 Normalized Graph Laplacians . . . . . . . . . . . . . . . . . 427
17.8.4 Spectral Clustering Algorithms . . . . . . . . . . . . . . . . . . . . . . 427
17.9 Markov Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
17.9.1 Regularized MCL (RMCL): Improvement over MCL . . . . . . . . . . 429
17.10 Multilevel Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
17.11 Local Partitioning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
17.12 Hypergraph Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
17.13 Emerging Methods for Partitioning Special Graphs . . . . . . . . . . . . . . . . 435
17.13.1 Bipartite Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
17.13.2 Dynamic Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
17.13.3 Heterogeneous Networks . . . . . . . . . . . . . . . . . . . . . . . . . 437
17.13.4 Directed Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
17.13.5 Combining Content and Relationship Information . . . . . . . . . . . . 439
17.13.6 Networks with Overlapping Communities . . . . . . . . . . . . . . . . 440
17.13.7 Probabilistic Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 442
17.14 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
18 A Survey of Uncertain Data Clustering Algorithms 457
Charu C. Aggarwal
18.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
18.2 Mixture Model Clustering of Uncertain Data . . . . . . . . . . . . . . . . . . . . 459
18.3 Density-Based Clustering Algorithms . . . . . . . . . . . . . . . . . . . . . . . 460
18.3.1 FDBSCAN Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 460
18.3.2 FOPTICS Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
18.4 Partitional Clustering Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 462
18.4.1 The UK-Means Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 462
18.4.2 The CK-Means Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 463
18.4.3 Clustering Uncertain Data with Voronoi Diagrams . . . . . . . . . . . . 464
18.4.4 Approximation Algorithms for Clustering Uncertain Data . . . . . . . . 464
18.4.5 Speeding Up Distance Computations . . . . . . . . . . . . . . . . . . . 465
18.5 Clustering Uncertain Data Streams . . . . . . . . . . . . . . . . . . . . . . . . . 466
18.5.1 The UMicro Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 466
18.5.2 The LuMicro Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 471
© 2014 by Taylor & Francis Group, LLC
Contents xvii
18.5.3 Enhancements to Stream Clustering . . . . . . . . . . . . . . . . . . . . 471
18.6 Clustering Uncertain Data in High Dimensionality . . . . . . . . . . . . . . . . . 472
18.6.1 Subspace Clustering of Uncertain Data . . . . . . . . . . . . . . . . . . 473
18.6.2 UPStream: Projected Clustering of Uncertain Data Streams . . . . . . . 474
18.7 Clustering with the Possible Worlds Model . . . . . . . . . . . . . . . . . . . . 477
18.8 Clustering Uncertain Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
18.9 Conclusions and Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
19 Concepts of Visual and Interactive Clustering 483
Alexander Hinneburg
19.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
19.2 Direct Visual and Interactive Clustering . . . . . . . . . . . . . . . . . . . . . . 484
19.2.1 Scatterplots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
19.2.2 Parallel Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488
19.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
19.3 Visual Interactive Steering of Clustering . . . . . . . . . . . . . . . . . . . . . . 491
19.3.1 Visual Assessment of Convergence of Clustering Algorithm . . . . . . . 491
19.3.2 Interactive Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . 492
19.3.3 Visual Clustering with SOMs . . . . . . . . . . . . . . . . . . . . . . . 494
19.3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494
19.4 Interactive Comparison and Combination of Clusterings . . . . . . . . . . . . . . 495
19.4.1 Space of Clusterings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
19.4.2 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497
19.4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497
19.5 Visualization of Clusters for Sense-Making . . . . . . . . . . . . . . . . . . . . 497
19.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500
20 Semisupervised Clustering 505
Amrudin Agovic and Arindam Banerjee
20.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506
20.2 Clustering with Pointwise and Pairwise Semisupervision . . . . . . . . . . . . . 507
20.2.1 Semisupervised Clustering Based on Seeding . . . . . . . . . . . . . . . 507
20.2.2 Semisupervised Clustering Based on Pairwise Constraints . . . . . . . . 508
20.2.3 Active Learning for Semisupervised Clustering . . . . . . . . . . . . . . 511
20.2.4 Semisupervised Clustering Based on User Feedback . . . . . . . . . . . 512
20.2.5 Semisupervised Clustering Based on Nonnegative Matrix Factorization . 513
20.3 Semisupervised Graph Cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
20.3.1 Semisupervised Unnormalized Cut . . . . . . . . . . . . . . . . . . . . 515
20.3.2 Semisupervised Ratio Cut . . . . . . . . . . . . . . . . . . . . . . . . . 515
20.3.3 Semisupervised Normalized Cut . . . . . . . . . . . . . . . . . . . . . . 516
20.4 A Unified View of Label Propagation . . . . . . . . . . . . . . . . . . . . . . . 517
20.4.1 Generalized Label Propagation . . . . . . . . . . . . . . . . . . . . . . 517
20.4.2 Gaussian Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
20.4.3 Tikhonov Regularization (TIKREG) . . . . . . . . . . . . . . . . . . . 518
20.4.4 Local and Global Consistency . . . . . . . . . . . . . . . . . . . . . . . 518
20.4.5 Related Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
20.4.5.1 Cluster Kernels . . . . . . . . . . . . . . . . . . . . . . . . . 519
20.4.5.2 Gaussian Random Walks EM (GWEM) . . . . . . . . . . . . 519
20.4.5.3 Linear Neighborhood Propagation . . . . . . . . . . . . . . . 520
20.4.6 Label Propagation and Green’s Function . . . . . . . . . . . . . . . . . 521
20.4.7 Label Propagation and Semisupervised Graph Cuts . . . . . . . . . . . . 521
© 2014 by Taylor & Francis Group, LLC
xviii Contents
20.5 Semisupervised Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
20.5.1 Nonlinear Manifold Embedding . . . . . . . . . . . . . . . . . . . . . . 522
20.5.2 Semisupervised Embedding . . . . . . . . . . . . . . . . . . . . . . . . 522
20.5.2.1 Unconstrained Semisupervised Embedding . . . . . . . . . . 523
20.5.2.2 Constrained Semisupervised Embedding . . . . . . . . . . . . 523
20.6 Comparative Experimental Analysis . . . . . . . . . . . . . . . . . . . . . . . . 524
20.6.1 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 524
20.6.2 Semisupervised Embedding Methods . . . . . . . . . . . . . . . . . . . 529
20.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 530
21 Alternative Clustering Analysis: A Review 535
James Bailey
21.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535
21.2 Technical Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537
21.3 Multiple Clustering Analysis Using Alternative Clusterings . . . . . . . . . . . . 538
21.3.1 Alternative Clustering Algorithms: A Taxonomy . . . . . . . . . . . . . 538
21.3.2 Unguided Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
21.3.2.1 Naive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
21.3.2.2 Meta Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 539
21.3.2.3 Eigenvectors of the Laplacian Matrix . . . . . . . . . . . . . . 540
21.3.2.4 Decorrelated k-Means and Convolutional EM . . . . . . . . . 540
21.3.2.5 CAMI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540
21.3.3 Guided Generation with Constraints . . . . . . . . . . . . . . . . . . . . 541
21.3.3.1 COALA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
21.3.3.2 Constrained Optimization Approach . . . . . . . . . . . . . . 541
21.3.3.3 MAXIMUS . . . . . . . . . . . . . . . . . . . . . . . . . . . 542
21.3.4 Orthogonal Transformation Approaches . . . . . . . . . . . . . . . . . 543
21.3.4.1 Orthogonal Views . . . . . . . . . . . . . . . . . . . . . . . . 543
21.3.4.2 ADFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543
21.3.5 Information Theoretic . . . . . . . . . . . . . . . . . . . . . . . . . . . 544
21.3.5.1 Conditional Information Bottleneck (CIB) . . . . . . . . . . . 544
21.3.5.2 Conditional Ensemble Clustering . . . . . . . . . . . . . . . . 544
21.3.5.3 NACI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544
21.3.5.4 mSC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
21.4 Connections to Multiview Clustering and Subspace Clustering . . . . . . . . . . 545
21.5 Future Research Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547
21.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547
22 Cluster Ensembles: Theory and Applications 551
Joydeep Ghosh and Ayan Acharya
22.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
22.2 The Cluster Ensemble Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 554
22.3 Measuring Similarity Between Clustering Solutions . . . . . . . . . . . . . . . . 555
22.4 Cluster Ensemble Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558
22.4.1 Probabilistic Approaches to Cluster Ensembles . . . . . . . . . . . . . . 558
22.4.1.1 A Mixture Model for Cluster Ensembles (MMCE) . . . . . . 558
22.4.1.2 Bayesian Cluster Ensembles (BCE) . . . . . . . . . . . . . . 558
22.4.1.3 Nonparametric Bayesian Cluster Ensembles (NPBCE) . . . . 559
22.4.2 Pairwise Similarity-Based Approaches . . . . . . . . . . . . . . . . . . 560
22.4.2.1 Methods Based on Ensemble Co-Association Matrix . . . . . 560
© 2014 by Taylor & Francis Group, LLC
Contents xix
22.4.2.2 Relating Consensus Clustering to Other Optimization Formu-
lations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562
22.4.3 Direct Approaches Using Cluster Labels . . . . . . . . . . . . . . . . . 562
22.4.3.1 Graph Partitioning . . . . . . . . . . . . . . . . . . . . . . . 562
22.4.3.2 Cumulative Voting . . . . . . . . . . . . . . . . . . . . . . . 563
22.5 Applications of Consensus Clustering . . . . . . . . . . . . . . . . . . . . . . . 564
22.5.1 Gene Expression Data Analysis . . . . . . . . . . . . . . . . . . . . . . 564
22.5.2 Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564
22.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566
23 Clustering Validation Measures 571
Hui Xiong and Zhongmou Li
23.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572
23.2 External Clustering Validation Measures . . . . . . . . . . . . . . . . . . . . . . 573
23.2.1 An Overview of External Clustering Validation Measures . . . . . . . . 574
23.2.2 Defective Validation Measures . . . . . . . . . . . . . . . . . . . . . . 575
23.2.2.1 K-Means: The Uniform Effect . . . . . . . . . . . . . . . . . 575
23.2.2.2 A Necessary Selection Criterion . . . . . . . . . . . . . . . . 576
23.2.2.3 The Cluster Validation Results . . . . . . . . . . . . . . . . . 576
23.2.2.4 The Issues with the Defective Measures . . . . . . . . . . . . 577
23.2.2.5 Improving the Defective Measures . . . . . . . . . . . . . . . 577
23.2.3 Measure Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . 577
23.2.3.1 Normalizing the Measures . . . . . . . . . . . . . . . . . . . 578
23.2.3.2 The DCV Criterion . . . . . . . . . . . . . . . . . . . . . . . 581
23.2.3.3 The Effect of Normalization . . . . . . . . . . . . . . . . . . 583
23.2.4 Measure Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584
23.2.4.1 The Consistency Between Measures . . . . . . . . . . . . . . 584
23.2.4.2 Properties of Measures . . . . . . . . . . . . . . . . . . . . . 586
23.2.4.3 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . 589
23.3 Internal Clustering Validation Measures . . . . . . . . . . . . . . . . . . . . . . 589
23.3.1 An Overview of Internal Clustering Validation Measures . . . . . . . . . 589
23.3.2 Understanding of Internal Clustering Validation Measures . . . . . . . . 592
23.3.2.1 The Impact of Monotonicity . . . . . . . . . . . . . . . . . . 592
23.3.2.2 The Impact of Noise . . . . . . . . . . . . . . . . . . . . . . 593
23.3.2.3 The Impact of Density . . . . . . . . . . . . . . . . . . . . . 594
23.3.2.4 The Impact of Subclusters . . . . . . . . . . . . . . . . . . . 595
23.3.2.5 The Impact of Skewed Distributions . . . . . . . . . . . . . . 596
23.3.2.6 The Impact of Arbitrary Shapes . . . . . . . . . . . . . . . . 598
23.3.3 Properties of Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 600
23.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601
24 Educational and Software Resources for Data Clustering 607
Charu C. Aggarwal and Chandan K. Reddy
24.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607
24.2 Educational Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 608
24.2.1 Books on Data Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 608
24.2.2 Popular Survey Papers on Data Clustering . . . . . . . . . . . . . . . . 608
24.3 Software for Data Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 610
24.3.1 Free and Open-Source Software . . . . . . . . . . . . . . . . . . . . . . 610
24.3.1.1 General Clustering Software . . . . . . . . . . . . . . . . . . 610
24.3.1.2 Specialized Clustering Software . . . . . . . . . . . . . . . . 610
© 2014 by Taylor & Francis Group, LLC
Other documents randomly have
different content
Data Clustering Algorithms and Applications First Edition Charu C. Aggarwal
Data Clustering Algorithms and Applications First Edition Charu C. Aggarwal
Data Clustering Algorithms and Applications First Edition Charu C. Aggarwal
The Project Gutenberg eBook of The Good
Englishwoman
This ebook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and with
almost no restrictions whatsoever. You may copy it, give it away
or re-use it under the terms of the Project Gutenberg License
included with this ebook or online at www.gutenberg.org. If you
are not located in the United States, you will have to check the
laws of the country where you are located before using this
eBook.
Title: The Good Englishwoman
Author: Orlo Williams
Release date: October 6, 2018 [eBook #58041]
Language: English
Credits: Produced by MFR, Les Galloway and the Online
Distributed
Proofreading Team at http://www.pgdp.net (This file
was
produced from images generously made available by
The
Internet Archive/American Libraries.)
*** START OF THE PROJECT GUTENBERG EBOOK THE GOOD
ENGLISHWOMAN ***
Transcriber’s Notes
Obvious typographical errors have been silently corrected. Variations in hyphenation
have been standardised but all other spelling and punctuation remains unchanged.
The half title immediately before the title page has been omitted.
THE
GOOD ENGLISHWOMAN
BY
ORLO WILLIAMS, M.C.
Author of “Vie de Boheme: A Patch of Romantic Paris,”
“The Life and Letters of John Rickman,” etc.
LONDON
GRANT RICHARDS LTD.
ST MARTIN’S STREET
MDCCCCXX
PRINTED IN GREAT BRITAIN BY
THE DUNEDIN PRESS LIMITED, EDINBURGH
TO BETTY
WHEN SHE IS OLDER
WITH THE SUPERFLUOUS INJUNCTION
NOT TO TAKE THIS BOOK
TOO SERIOUSLY
Data Clustering Algorithms and Applications First Edition Charu C. Aggarwal
CONTENTS
CHAPTER PAGE
I.The Man in the Sidecar 9
II.Little Girls 29
III.Big Girls 51
IV.The English Wife 76
V.The English Mother 102
VI.The Englishwoman’s Mind 128
VII.The Englishwoman’s Manners 145
VIII.The Englishwoman and the Arts 166
IX.The Englishwoman in Society 187
X.The Englishwoman at Work 204
XI.The Englishwoman at Play 219
XII.The Englishwoman in Parliament234
Data Clustering Algorithms and Applications First Edition Charu C. Aggarwal
CHAPTER I
A FEW REMARKS FROM THE MAN IN THE
SIDECAR
My uncle Joseph, a solitary man, once broke the silence of a
country walk by asserting with explosive emphasis: “I don’t see how
any man can understand women.” I assented vaguely, and he went
on: “How can we ever grasp their point of view, my dear boy, which
is so totally different from ours? How can we understand the outlook
on life of beings whose instincts, training, purpose, ambitions have
so little resemblance to ours? For my part I have given up trying: it
is a waste of time. Never let a woman flatter you into thinking that
you understand her: she is trying to make you her tool. The
Egyptians gave the Sphinx a woman’s face and they were right.
Women are so mysterious.” And the south-west wind took up his
words and whispered them to the trees, which nodded their heads
and waved their branches, rustling “mysterious, mysterious” in all
their leaves.
I do not argue with my uncle Joseph, especially on a country walk
when the south-west wind is blowing. So I took out my pipe and lit it
in spite of the south-west wind, saying to myself: “You silly wind,
you silly trees, you know nothing of wisdom. You would catch up
anything that my uncle Joseph said and make it seem important.”
And the south-west wind solemnly breathed “important” into the ear
of a little quarry, in the tone of a ripe family butler. “There is just as
much, and just as little, mystery about men and women as there is
about you. It depends how much one wants to know. So far as there
is any mystery, as a matter of fact, it is much more on the side of
men, who are far more incalculable, far more complex than women
in their motives and reactions. But men are lazy, you silly old things,
and it saves a lot of trouble to invent a mystery and give it up rather
than sit down before a problem to study it. Men have thousands of
other things to think about besides women, but women, who have
not the same variety, are so devilish insistent, that they would keep
men thinking about them all their time if they could. So, in self-
defence, men have pacified the dear things by calling them
mysterious, which is highly flattering, and by giving them up for
three-quarters of their days. Uncle Joseph has probably been
arguing unsuccessfully with Aunt Georgiana, as he always will,
because he never took the trouble to master her mental and
emotional processes. But that does not prove the general truth of his
proposition. His is just the mind which grows those weeds of
everyday thought the seeds of which thoughtless south-west winds
blow about as they do the seeds of thistles. Go off and blow those
clouds away, you reverberator of commonplaces.”
Throwing up his hands with a shriek of “commonplaces,” the wind
flew up over the hill ruffling its hair as he passed.
I think I was quite right not to answer my uncle Joseph and to
rebuke the south-west wind. People are so tiresomely fond of
uttering generalisations which they do not really believe and on
which they never act. It is surely no less foolish to say that women
are complete mysteries than to say that one understands them
perfectly. Every individual understands a few men and a few women,
or life would be impossible. Besides, understanding has its degrees
which approach, but never reach, perfection. Samuel Butler
somewhere says that the process of love could only be logically
concluded by eating the loved one—a coarse way of saying that
perfect love would end in complete assimilation: it is the same with
the relation of knowledge. Happily love between human beings of
opposite sexes can exist without being pushed to this voracious
conclusion: so can understanding.
It may be true that women have quicker intuitions than men,
though only over a limited range of subjects: but men, on the other
hand, are more widely and studiously observant, besides being far
more interested in the attainment of truth as the result of
observation. Patient induction is, after all, an excellent substitute for
brilliant guessing. Women would be extremely disappointed if men
really acted on the “mystery” theory and took to thinking or writing
as little about woman as the majority think or write about the
problem of existence. Nothing, however, will prevent men from
talking and thinking about women, and a glance at any bookshelf
will prove that they do not always do so in complete ignorance of
their subject. Balzac, who was no magician, was not entirely beside
the mark in creating the Duchesse de Maufrigneuse, and Lady Teazle
is a recognizable being. George Meredith’s Diana seems to have
human substance: Mr Shaw’s Anne in “Man and Superman” and Mr
Wells’ Anne Veronica, though founded on masculine observations,
are admitted by women to be reasonable creations. The laziness of
men, I repeat, and the vanity of women are responsible for the
legend of woman’s inviolable mystery. The laws of gravitation were a
mystery till Newton used his observation: the mystery still remains,
but the experiments of Newton and other physicists has driven it
further back. So it is with the human soul. Each one is a mystery,
but observation and familiarity can penetrate a number of its veils,
leaving only some of the intimate recesses unexplored, and even
these recesses are threatened with exposure as our knowledge of
telepathy and of the subconscious elements increases.
There are certain experiences of women which a man cannot
share, certain aspirations and fears at whose poignancy he can only
guess, certain instinctive impulses of which he is not directly
conscious: but he can surmount the barriers in some measure by the
use of his eyes and ears. If, therefore, he choose to record what his
eyes and ears tell him, he is not exceeding the limits of masculine
capacity. My uncle Joseph could hardly deplore so unpretentious a
line of approach. A mere man may be content to leave Miss Dorothy
Richardson and Miss May Sinclair delving gloomily in the jungles of
feminine psychology where he would fear to follow them, and yet
feel that, without presumption, he may hold some views about his
natural complement. The question is what views are right and what
are wrong. The war has changed many things, and man’s views
about his natural complement among them. Most people, with that
useful faculty of oblivion for which we thank Providence, have
forgotten what they thought in 1914: if there were such a thing as a
mental gramophone which could record their thoughts of five years
ago, they would be extremely surprised. Things that seemed absurd
then have now been taken for granted, and it is possible that many
things taken for granted then may be shown to have become
absurd. It has certainly become ridiculous to speak of the “weaker
sex,” except in a strictly muscular sense. Women have revealed
capacities for organisation and disciplined effort in large bodies,
especially in this country, for which the epithet “surprising” is but
feeble. Has this fact alone not caused a revolution of ideas? If we
have not all accepted it yet, we shall all soon have to accept the
principle that, in all but purely physical exertion, men and women
have equal potential abilities. The potential ability of women is still in
need of development, for they are starting some centuries behind
the men, but the inevitable result will be the recognition of “equal
opportunity.” To what sociological crisis this may lead, I do not know,
and as this is not a sociological treatise, I need not prophesy: but it
is an element that must count heavily in any review of old ideas.
Another element which must count is the franchise, which will, of
course, be extended in the near future till there is no inequality
between the sexes in this respect. Women are political beings with
vast possibilities of becoming a political force. They will play a more
and more important part in the history of the nation. They will dance
a new dance in the ballet of humanity. That recently so familiar
figure in a short skirt of khaki and close-fitting cap, seated firmly but
not too gracefully astride a motor bicycle rushing with its side-car,
and often its male passenger, through the traffic is more than a
phenomenon, it is a symbol. The air has whipped her cheeks pink
and blown loose a stray lock above her determined eyes. What
beauties she has of form or feature are none of them hid. She is all
the woman that the world has known, but with a new purpose and a
new poise. For good or ill she has entered the machine, and we
came to look on her with an indifferent and familiar eye. But what
will she do, what will she think, whither will she carry us in that side-
car of hers? To all her ancient qualities she has added a new one:
object of desire, mother of children, guardian of the hearth, mate of
man or virgin saint, she has now another manifestation, that of
fellow-combatant; some say, also of adversary. One might almost
say that, bending over the handle-bars of her machine, with her
body curved and her legs planted firmly on the footboard she mimes
the very mark of interrogation which her changes of social posture
present. A living query in khaki, she is a challenge to the prophet
and the philosopher. One who is neither will let the challenge pass,
sure only of one thing—that develop as she may and carry us where
she will, the tradition of the good Englishwoman is safe in her
keeping.
“The good Englishwoman,” an untranslatable phrase—I beseech
our French neighbours not to translate it la bonne anglaise—is an
expression which has a corresponding reality. We all know it, in our
flesh, in our bones, in our minds and in our souls. The
Englishwoman is a definite person to all of us in England: she is not
merely the female of the species living in these isles, she has a
significance in the world at large. We love her and we honour her,
but we do not often reflect what it is that we love and honour. It is a
mental occupation which might be more frequently indulged in, were
we not such indifferent reflectors. The ingenious Henry Adams, that
enlightened but pensive American, whose death has just given us
one of the most fascinating books of modern times, spent his whole
life in reflecting on his countrymen, with results which are
stimulating if not encouraging. He did not spend so much time
reflecting on his countrywomen, though he said that he owed more
to them than to any man, but his reflections on that head resolved
themselves into a question which no Englishman would formulate in
similar circumstances. Henry Adams used to invite agreeable and
witty people to dine,1 and, at an unexpected moment, to propound
to the “brightest” of the women the question: “Why is the American
woman a failure?” He meant a failure as a force rather than as an
individual, but it was an irritating question all the same, nor is it
surprising that it usually drew the answer: “Because the American
man is a failure.” The Englishman would be too chivalrous to ask
such a question of his guests, but he would not even formulate it.
The Englishman, even a considerably sophisticated one, could never
think of the Englishwoman as a failure, whether as an individual, a
force or an inspiration. He is bound by his experience, his upbringing
and his instincts to think of her as a success. Let us then put the
question “Why is the Englishwoman a success?” We shall get no very
good impromptu answers, nor do I suggest that “Because the
Englishman is a success” would be the correct one. We should be
the last to take so much credit to ourselves. We are justly proud of
the Englishwoman, but what is it of which we are proud? Of all the
approving epithets that have been applied to women, which do we
choose for our own? Is our pride in their beauty, their brilliance, their
courage, their wit, their tact, their energy, their endurance, their
sagacity, their skill in handicraft, their devotion to their young, their
taste in art and dress, their grace of movement, the sweetness of
their speech or the greatness of their minds? Are they only an
attraction or an independent force? Are they better mistresses or
mothers? When Henry Adams lived in this country as a young man
he found that "Englishwomen, from the educational point of view,
could give nothing until they approached forty years old. Then they
become very interesting—very charming—to the man of fifty." What
do we say to such a criticism from so acute a mind?
It is easier to ask questions than to answer them, and I propose
to shirk the harder part of the task. Questions cannot be
satisfactorily answered for other people, and, where everyone has to
make up his or her mind, the mere asking of questions is in itself an
aid to their solution. Each reader will answer the questions I have
asked in a different way: having done so, he must pass to another
consideration. We are proud of the Englishwoman, but we criticise
her, again each one of us differently. We must consider the grounds
of our criticism. She dresses badly, some will say; her hair is always
untidy, say others; foreigners assert that she is proud and stupid;
Englishmen, secretly glad that she is proud, try to forget that she is
poorly educated. That she walks gracefully, none will say, but as an
athlete she is second to none: it would be rash to say that her taste
in the home is remarkable, but the atmosphere of home, which not
even the most hideous decoration can kill nor the most beautiful
create, emanates from her alone. As a housewife she has her glories
and her failings. She has not the almost brutish industry of the
German nor the avaricious acuteness of the French bourgeoise; she
is, in general, neither expert in household industry nor in business.
Nevertheless, the Englishman is only really contented in a household
presided over and served by Englishwomen, and that is not only
because they understand his wants, but because they are genial and
simple, neither servile nor imperious, good comrades who do not
expect too little or exact too much. Fearless in her actions, the
Englishwoman is timid in her ideas: what she may do in the future is
incalculable, her possibilities are unbounded; but there seem to be
limits to the expansion, except by imitation, of her power of thought.
As an administrator she will find no superior, but the political
thinkers, as well as the artists, will for the most part come from
other nations. These are but random criticisms which, among others,
will occur to any mind that reflects upon the subject. They show,
once more, that the essence of the Englishwoman or of her
goodness is not a simple one. She is therefore an excellent topic for
a conversation that should be provocative and stimulating. If I
sustain one part, the reader will mentally sustain the other. Let us
continue it.
It is hardly necessary to say that any criticism of the
Englishwoman in these pages is not an attack upon her: nor is any
approbation to be considered a defence. At least I pay this much
respect to my uncle Joseph that no woman shall flatter me into
defending her: she is more than capable of doing this for herself.
But, beyond this, I quite fail to understand what a friend of mine
meant when he suggested that I should write in defence of women.
“Against whom or against what?” I asked, but his explanation was
not lucid. I gathered that he had in mind the complaint sometimes
heard that women have ceased to be women in order to become
inferior men; that they are getting hard and conceited; that they
turn up their noses at the domestic virtues, at marriage and the
whole conception of life as duty, and that they think only of having
“a good time.” The isolated instances given as grounds for this
complaint are, I am convinced, not typical. That women have
developed and broken through the far too narrow restrictions of a
hundred years ago is only a matter for thankfulness: something is
always lost in every adjustment, but more is gained if the
adjustment is natural. The flighty girl whom most grumblers of this
kind have in mind is only a fraction, and a very imperfect fraction, of
the Englishwoman. A far more serious line was taken by Henry
Adams towards the end of his life, when he became finally convinced
that he was a man of the eighteenth century living in an unfamiliar
world whose guiding forces he could not fathom. Musing over the
enormous mass of new forces put into the hand of man by the end
of the nineteenth century, he wondered what should be the result of
so much energy turned over to the use of women, according to the
scientific notions of force. He could not write down the equation.
The picture of the world that he saw was of man bending eagerly
over the steering wheel of a rushing motor car too intent on keeping
up a high speed and avoiding accidents to have leisure for any
distractions. The old attraction of the woman, one of the most
powerful forces of the past, had become a distraction, and woman,
no longer able to inspire men, had been forced to follow them.
Woman had been set free: as travellers, typists, telephone girls,
factory hands, they moved untrammelled in the world. But in what
direction were they moving? After the men, said Henry Adams;
discarding all the qualities for which men had no longer any interest
or pleasure, they too were bending over the steering wheel in the
same rapid career. Woman the rebel was now free and there was
only one thing left for her to rebel against, maternity, or the inertia
of sex, to speak in terms of force. Inertia of sex, the philosopher
truly remarked, could not be overcome without extinguishing the
race, yet an immense force was working irresistibly to overcome it.
What would happen? Henry Adams gave up the riddle, grateful for
the illusion that woman alone of all the species was unable to
change.
Superficial observers might say that this movement has been
accelerated by the war. Hundreds of homes have loosened their ties
in the stress of war, thousands of unrebellious daughters have left
their narrow walls at the call of patriotism and are now unwilling to
return to them. They have learnt to live in the herd with their own
sex, and prefer it to living with their own sex in the pen; physical
danger and discomfort are no longer bogeys to frighten them; they
have been “on their own,” and “on their own” they intend to stay. All
very true, no doubt, with the added complication of serious
competition between the sexes in a restricted labour market. At the
same time, these superficial observers forget that there has been an
extraordinary return to the traditional relations between men and
women during the war. The inspiration of the woman has never been
stronger; once more, after many years, men have fought for their
women and the women have regarded their champions with
gratitude; women have tended and worked for men in greater
numbers and with greater alacrity than ever before in the history of
the world; the comradeship between the sexes has grown warmer
and stronger without destroying the still more natural relation, for
marriage as an institution has enjoyed a season of abnormal
popularity. In a country at war, especially in a country invaded, men
and women return to the relations of extreme antiquity; the men
fight to protect the home and the family, which they alone can do. If
they are beaten, the home is destroyed and the women are
ravished.
We in England have escaped this last simplification: we have been
lucky, but we have lost the directness of the lesson. Nevertheless, it
is patent enough to thoughtful people. War has revealed men and
women pretty much as they always have been, and the revelation
will not be forgotten. The apprehensions of a Henry Adams, after the
five years of war, do, in fact, appear to be exaggerated. The futility
of all that vast array of mechanical force which so appalled him has
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!
ebookultra.com

More Related Content

Similar to Data Clustering Algorithms and Applications First Edition Charu C. Aggarwal (20)

PDF
Service Oriented Distributed Knowledge Discovery 1st Edition Domenico Talia
rizerahahaah
 
PDF
Healthcare Data Analytics 1st Edition Chandan K. Reddy
ayaledikmov
 
PDF
Service Oriented Distributed Knowledge Discovery 1st Edition Domenico Talia
oukxrgbg279
 
PDF
Next generation of data mining 1st Edition Hillol Kargupta
boziditoc
 
PDF
Serviceoriented Distributed Knowledge Discovery 1st Edition Domenico Talia
redlowhabbab
 
PDF
Next generation of data mining 1st Edition Hillol Kargupta
hvkmifed4271
 
PDF
Service Oriented Distributed Knowledge Discovery 1st Edition Domenico Talia
molexwqny0794
 
PDF
Healthcare Data Analytics 1st Edition Chandan K. Reddy
zrwoqlrsl136
 
PDF
Computational Methods Of Feature Selection Huan Liu Hiroshi Motoda
rasabigley
 
PDF
Big Data In Complex And Social Networks My T Thai Weili Wu Hui Xiong
dlxaarti
 
PDF
Operations Research and Cyber Infrastructure John W. Chinneck
rasitheifler
 
PDF
Artificial Intelligence Advances Ethics And Strategies James M Nichols
tielacecyy
 
PDF
Computational Methods of Feature Selection 1st Edition Huan Liu (Editor)
lemanamiddag
 
PDF
Privacyaware Knowledge Discovery Novel Applications And New Techniques France...
ffxglzb559
 
PDF
Biological Data Mining Chapman Hall Crc Data Mining and Knowledge Discovery S...
silaevpleaes
 
PDF
Researching Cybercrimes Methodologies Ethics And Critical Approaches Anita La...
smelaorzelxi
 
PDF
Researching Cybercrimes Methodologies Ethics And Critical Approaches Anita La...
smelaorzelxi
 
PDF
Digital Twin Technology And Applications A Daniel Srinivasan Sriramulu
bojamgalayx
 
PDF
Computational Methods of Feature Selection 1st Edition Huan Liu (Editor)
doweyhostel
 
PDF
Numerical Methods and Optimization An Introduction 1st Edition Pardalos
mematineslyn
 
Service Oriented Distributed Knowledge Discovery 1st Edition Domenico Talia
rizerahahaah
 
Healthcare Data Analytics 1st Edition Chandan K. Reddy
ayaledikmov
 
Service Oriented Distributed Knowledge Discovery 1st Edition Domenico Talia
oukxrgbg279
 
Next generation of data mining 1st Edition Hillol Kargupta
boziditoc
 
Serviceoriented Distributed Knowledge Discovery 1st Edition Domenico Talia
redlowhabbab
 
Next generation of data mining 1st Edition Hillol Kargupta
hvkmifed4271
 
Service Oriented Distributed Knowledge Discovery 1st Edition Domenico Talia
molexwqny0794
 
Healthcare Data Analytics 1st Edition Chandan K. Reddy
zrwoqlrsl136
 
Computational Methods Of Feature Selection Huan Liu Hiroshi Motoda
rasabigley
 
Big Data In Complex And Social Networks My T Thai Weili Wu Hui Xiong
dlxaarti
 
Operations Research and Cyber Infrastructure John W. Chinneck
rasitheifler
 
Artificial Intelligence Advances Ethics And Strategies James M Nichols
tielacecyy
 
Computational Methods of Feature Selection 1st Edition Huan Liu (Editor)
lemanamiddag
 
Privacyaware Knowledge Discovery Novel Applications And New Techniques France...
ffxglzb559
 
Biological Data Mining Chapman Hall Crc Data Mining and Knowledge Discovery S...
silaevpleaes
 
Researching Cybercrimes Methodologies Ethics And Critical Approaches Anita La...
smelaorzelxi
 
Researching Cybercrimes Methodologies Ethics And Critical Approaches Anita La...
smelaorzelxi
 
Digital Twin Technology And Applications A Daniel Srinivasan Sriramulu
bojamgalayx
 
Computational Methods of Feature Selection 1st Edition Huan Liu (Editor)
doweyhostel
 
Numerical Methods and Optimization An Introduction 1st Edition Pardalos
mematineslyn
 

Recently uploaded (20)

PPTX
Latest Features in Odoo 18 - Odoo slides
Celine George
 
PPTX
Views on Education of Indian Thinkers J.Krishnamurthy..pptx
ShrutiMahanta1
 
PPTX
Optimizing Cancer Screening With MCED Technologies: From Science to Practical...
i3 Health
 
PPTX
Optimizing Cancer Screening With MCED Technologies: From Science to Practical...
i3 Health
 
PDF
1, 2, 3… E MAIS UM CICLO CHEGA AO FIM!.pdf
Colégio Santa Teresinha
 
PPTX
Pyhton with Mysql to perform CRUD operations.pptx
Ramakrishna Reddy Bijjam
 
PPTX
How to Create Rental Orders in Odoo 18 Rental
Celine George
 
PPSX
HEALTH ASSESSMENT (Community Health Nursing) - GNM 1st Year
Priyanshu Anand
 
PPTX
How to Manage Access Rights & User Types in Odoo 18
Celine George
 
PDF
CEREBRAL PALSY: NURSING MANAGEMENT .pdf
PRADEEP ABOTHU
 
PPTX
Views on Education of Indian Thinkers Mahatma Gandhi.pptx
ShrutiMahanta1
 
PPTX
2025 Winter SWAYAM NPTEL & A Student.pptx
Utsav Yagnik
 
PPTX
Unit 2 COMMERCIAL BANKING, Corporate banking.pptx
AnubalaSuresh1
 
PPTX
Presentation: Climate Citizenship Digital Education
Karl Donert
 
PPTX
Growth and development and milestones, factors
BHUVANESHWARI BADIGER
 
PDF
IMP NAAC REFORMS 2024 - 10 Attributes.pdf
BHARTIWADEKAR
 
PPSX
Health Planning in india - Unit 03 - CHN 2 - GNM 3RD YEAR.ppsx
Priyanshu Anand
 
PPTX
HYDROCEPHALUS: NURSING MANAGEMENT .pptx
PRADEEP ABOTHU
 
PDF
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - GLOBAL SUCCESS - CẢ NĂM - NĂM 2024 (VOCABULARY, ...
Nguyen Thanh Tu Collection
 
PDF
CONCURSO DE POESIA “POETUFAS – PASSOS SUAVES PELO VERSO.pdf
Colégio Santa Teresinha
 
Latest Features in Odoo 18 - Odoo slides
Celine George
 
Views on Education of Indian Thinkers J.Krishnamurthy..pptx
ShrutiMahanta1
 
Optimizing Cancer Screening With MCED Technologies: From Science to Practical...
i3 Health
 
Optimizing Cancer Screening With MCED Technologies: From Science to Practical...
i3 Health
 
1, 2, 3… E MAIS UM CICLO CHEGA AO FIM!.pdf
Colégio Santa Teresinha
 
Pyhton with Mysql to perform CRUD operations.pptx
Ramakrishna Reddy Bijjam
 
How to Create Rental Orders in Odoo 18 Rental
Celine George
 
HEALTH ASSESSMENT (Community Health Nursing) - GNM 1st Year
Priyanshu Anand
 
How to Manage Access Rights & User Types in Odoo 18
Celine George
 
CEREBRAL PALSY: NURSING MANAGEMENT .pdf
PRADEEP ABOTHU
 
Views on Education of Indian Thinkers Mahatma Gandhi.pptx
ShrutiMahanta1
 
2025 Winter SWAYAM NPTEL & A Student.pptx
Utsav Yagnik
 
Unit 2 COMMERCIAL BANKING, Corporate banking.pptx
AnubalaSuresh1
 
Presentation: Climate Citizenship Digital Education
Karl Donert
 
Growth and development and milestones, factors
BHUVANESHWARI BADIGER
 
IMP NAAC REFORMS 2024 - 10 Attributes.pdf
BHARTIWADEKAR
 
Health Planning in india - Unit 03 - CHN 2 - GNM 3RD YEAR.ppsx
Priyanshu Anand
 
HYDROCEPHALUS: NURSING MANAGEMENT .pptx
PRADEEP ABOTHU
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - GLOBAL SUCCESS - CẢ NĂM - NĂM 2024 (VOCABULARY, ...
Nguyen Thanh Tu Collection
 
CONCURSO DE POESIA “POETUFAS – PASSOS SUAVES PELO VERSO.pdf
Colégio Santa Teresinha
 
Ad

Data Clustering Algorithms and Applications First Edition Charu C. Aggarwal

  • 1. Visit https://ebookultra.com to download the full version and explore more ebooks or textbooks Data Clustering Algorithms and Applications First Edition Charu C. Aggarwal _____ Click the link below to download _____ https://ebookultra.com/download/data-clustering-algorithms- and-applications-first-edition-charu-c-aggarwal/ Explore and download more ebooks or textbooks at ebookultra.com
  • 2. Here are some recommended products that we believe you will be interested in. You can click the link to download. Text Mining Classification Clustering and Applications 1st Edition Ashok Srivastava https://ebookultra.com/download/text-mining-classification-clustering- and-applications-1st-edition-ashok-srivastava/ Growing Algorithms and Data Structures 4th Edition David Scuse https://ebookultra.com/download/growing-algorithms-and-data- structures-4th-edition-david-scuse/ Image fusion algorithms and applications Tania Stathaki https://ebookultra.com/download/image-fusion-algorithms-and- applications-tania-stathaki/ Learning JavaScript Data Structures and Algorithms 2nd Edition Loiane Groner https://ebookultra.com/download/learning-javascript-data-structures- and-algorithms-2nd-edition-loiane-groner/
  • 3. Learning F Functional Data Structures and Algorithms 1st Edition Masood https://ebookultra.com/download/learning-f-functional-data-structures- and-algorithms-1st-edition-masood/ Data Analytics Models and Algorithms for Intelligent Data Analysis 1st Edition Thomas A. Runkler (Auth.) https://ebookultra.com/download/data-analytics-models-and-algorithms- for-intelligent-data-analysis-1st-edition-thomas-a-runkler-auth/ Concise Notes on Data Structures and Algorithms Ruby Edition Christopher Fox https://ebookultra.com/download/concise-notes-on-data-structures-and- algorithms-ruby-edition-christopher-fox/ Data Structures and Algorithms in Java 4th Edition Michael T. Goodrich https://ebookultra.com/download/data-structures-and-algorithms-in- java-4th-edition-michael-t-goodrich/ Data Structures and Algorithms in Java 6th Edition Michael T. Goodrich https://ebookultra.com/download/data-structures-and-algorithms-in- java-6th-edition-michael-t-goodrich/
  • 5. Data Clustering Algorithms and Applications First Edition Charu C. Aggarwal Digital Instant Download Author(s): Charu C. Aggarwal, Chandan K. Reddy, (eds.) ISBN(s): 9781315373515, 1315373513 Edition: First edition File Details: PDF, 12.52 MB Year: 2014 Language: english
  • 6. K15510 DATA CLUSTERING DATA CLUSTERING Algorithms and Applications Aggarwal • Reddy Research on the problem of clustering tends to be fragmented across the pattern recognition, database, data mining, and machine learning communities. Addressing this problem in a unified way, Data Clustering: Algorithms and Applications provides complete coverage of the entire area of clustering, from basic methods to more refined and complex data clustering approaches. It pays special attention to recent issues in graphs, social networks, and other domains. The book focuses on three primary aspects of data clustering: • Methods, describing key techniques commonly used for clustering, such as feature selection, agglomerative clustering, partitional clustering, density-based clustering, probabilistic clustering, grid-based clustering, spectral clustering, and nonnegative matrix factorization • Domains, covering methods used for different domains of data, such as categorical data, text data, multimedia data, graph data, biological data, stream data, uncertain data, time series clustering, high-dimensional clustering, and big data • Variations and Insights, discussing important variations of the clustering process, such as semisupervised clustering, interactive clustering, multiview clustering, cluster ensembles, and cluster validation In this book, top researchers from around the world explore the characteristics of clustering problems in a variety of application areas. They also explain how to glean detailed insight from the clustering process—including how to verify the quality of the underlying clusters—through supervision, human intervention, or the automated generation of alternative clusters. Data Mining Chapman & Hall/CRC Data Mining and Knowledge Discovery Series Chapman & Hall/CRC Data Mining and Knowledge Discovery Series K15510_Cover.indd 1 7/24/13 2:46 PM
  • 8. Chapman & Hall/CRC Data Mining and Knowledge Discovery Series PUBLISHED TITLES SERIES EDITOR Vipin Kumar University of Minnesota Department of Computer Science and Engineering Minneapolis, Minnesota, U.S.A. AIMS AND SCOPE This series aims to capture new developments and applications in data mining and knowledge discovery, while summarizing the computational tools and techniques useful in data analysis.This series encourages the integration of mathematical, statistical, and computational methods and techniques through the publication of a broad range of textbooks, reference works, and hand- books. The inclusion of concrete examples and applications is highly encouraged. The scope of the series includes, but is not limited to, titles in the areas of data mining and knowledge discovery methods and applications, modeling, algorithms, theory and foundations, data and knowledge visualization, data mining systems and tools, and privacy and security issues. ADVANCES IN MACHINE LEARNING AND DATA MINING FOR ASTRONOMY Michael J. Way, Jeffrey D. Scargle, Kamal M. Ali, and Ashok N. Srivastava BIOLOGICAL DATA MINING Jake Y. Chen and Stefano Lonardi COMPUTATIONAL INTELLIGENT DATA ANALYSIS FOR SUSTAINABLE DEVELOPMENT TingYu, NiteshV. Chawla, and Simeon Simoff COMPUTATIONAL METHODS OF FEATURE SELECTION Huan Liu and Hiroshi Motoda CONSTRAINED CLUSTERING: ADVANCES IN ALGORITHMS, THEORY, AND APPLICATIONS Sugato Basu, Ian Davidson, and Kiri L. Wagstaff CONTRAST DATA MINING: CONCEPTS, ALGORITHMS, AND APPLICATIONS Guozhu Dong and James Bailey DATA CLUSTERING: ALGORITHMS AND APPLICATIONS Charu C. Aggarawal and Chandan K. Reddy DATA CLUSTERING IN C++: AN OBJECT-ORIENTED APPROACH Guojun Gan DATA MINING FOR DESIGN AND MARKETING Yukio Ohsawa and Katsutoshi Yada DATA MINING WITH R: LEARNING WITH CASE STUDIES Luís Torgo FOUNDATIONS OF PREDICTIVE ANALYTICS James Wu and Stephen Coggeshall GEOGRAPHIC DATA MINING AND KNOWLEDGE DISCOVERY, SECOND EDITION Harvey J. Miller and Jiawei Han HANDBOOK OF EDUCATIONAL DATA MINING Cristóbal Romero, Sebastian Ventura, Mykola Pechenizkiy, and Ryan S.J.d. Baker © 2014 by Taylor & Francis Group, LLC
  • 9. INFORMATION DISCOVERY ON ELECTRONIC HEALTH RECORDS Vagelis Hristidis INTELLIGENT TECHNOLOGIES FOR WEB APPLICATIONS Priti Srinivas Sajja and Rajendra Akerkar INTRODUCTION TO PRIVACY-PRESERVING DATA PUBLISHING: CONCEPTS AND TECHNIQUES Benjamin C. M. Fung, Ke Wang, Ada Wai-Chee Fu, and Philip S. Yu KNOWLEDGE DISCOVERY FOR COUNTERTERRORISM AND LAW ENFORCEMENT David Skillicorn KNOWLEDGE DISCOVERY FROM DATA STREAMS João Gama MACHINE LEARNING AND KNOWLEDGE DISCOVERY FOR ENGINEERING SYSTEMS HEALTH MANAGEMENT Ashok N. Srivastava and Jiawei Han MINING SOFTWARE SPECIFICATIONS: METHODOLOGIES AND APPLICATIONS DavidLo,Siau-ChengKhoo,JiaweiHan,andChaoLiu MULTIMEDIA DATA MINING: A SYSTEMATIC INTRODUCTION TO CONCEPTS AND THEORY Zhongfei Zhang and Ruofei Zhang MUSIC DATA MINING Tao Li, Mitsunori Ogihara, and George Tzanetakis NEXT GENERATION OF DATA MINING Hillol Kargupta, Jiawei Han, Philip S. Yu, Rajeev Motwani, and Vipin Kumar PRACTICAL GRAPH MINING WITH R Nagiza F. Samatova, William Hendrix, John Jenkins, Kanchana Padmanabhan, and Arpan Chakraborty RELATIONAL DATA CLUSTERING: MODELS, ALGORITHMS, AND APPLICATIONS Bo Long, Zhongfei Zhang, and Philip S. Yu SERVICE-ORIENTED DISTRIBUTED KNOWLEDGE DISCOVERY Domenico Talia and Paolo Trunfio SPECTRAL FEATURE SELECTION FOR DATA MINING Zheng Alan Zhao and Huan Liu STATISTICAL DATA MINING USING SAS APPLICATIONS, SECOND EDITION George Fernandez SUPPORTVECTOR MACHINES: OPTIMIZATION BASED THEORY, ALGORITHMS, AND EXTENSIONS Naiyang Deng, Yingjie Tian, and Chunhua Zhang TEMPORAL DATA MINING Theophano Mitsa TEXT MINING: CLASSIFICATION, CLUSTERING, AND APPLICATIONS Ashok N. Srivastava and Mehran Sahami THE TOP TEN ALGORITHMS IN DATA MINING Xindong Wu and Vipin Kumar UNDERSTANDING COMPLEX DATASETS: DATA MINING WITH MATRIX DECOMPOSITIONS David Skillicorn © 2014 by Taylor & Francis Group, LLC
  • 10. © 2014 by Taylor & Francis Group, LLC
  • 11. DATA CLUSTERING Algorithms and Applications Edited by Charu C. Aggarwal Chandan K. Reddy © 2014 by Taylor & Francis Group, LLC
  • 12. CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2014 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Version Date: 20130508 International Standard Book Number-13: 978-1-4665-5822-9 (eBook - PDF) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the valid- ity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or uti- lized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopy- ing, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http:// www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com © 2014 by Taylor & Francis Group, LLC
  • 13. Contents Preface xxi Editor Biographies xxiii Contributors xxv 1 An Introduction to Cluster Analysis 1 Charu C. Aggarwal 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Common Techniques Used in Cluster Analysis . . . . . . . . . . . . . . . . . . 3 1.2.1 Feature Selection Methods . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.2 Probabilistic and Generative Models . . . . . . . . . . . . . . . . . . . 4 1.2.3 Distance-Based Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.4 Density- and Grid-Based Methods . . . . . . . . . . . . . . . . . . . . . 7 1.2.5 Leveraging Dimensionality Reduction Methods . . . . . . . . . . . . . 8 1.2.5.1 Generative Models for Dimensionality Reduction . . . . . . . 8 1.2.5.2 Matrix Factorization and Co-Clustering . . . . . . . . . . . . 8 1.2.5.3 Spectral Methods . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2.6 The High Dimensional Scenario . . . . . . . . . . . . . . . . . . . . . . 11 1.2.7 Scalable Techniques for Cluster Analysis . . . . . . . . . . . . . . . . . 13 1.2.7.1 I/O Issues in Database Management . . . . . . . . . . . . . . 13 1.2.7.2 Streaming Algorithms . . . . . . . . . . . . . . . . . . . . . 14 1.2.7.3 The Big Data Framework . . . . . . . . . . . . . . . . . . . . 14 1.3 Data Types Studied in Cluster Analysis . . . . . . . . . . . . . . . . . . . . . . 15 1.3.1 Clustering Categorical Data . . . . . . . . . . . . . . . . . . . . . . . . 15 1.3.2 Clustering Text Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.3.3 Clustering Multimedia Data . . . . . . . . . . . . . . . . . . . . . . . . 16 1.3.4 Clustering Time-Series Data . . . . . . . . . . . . . . . . . . . . . . . . 17 1.3.5 Clustering Discrete Sequences . . . . . . . . . . . . . . . . . . . . . . . 17 1.3.6 Clustering Network Data . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.3.7 Clustering Uncertain Data . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.4 Insights Gained from Different Variations of Cluster Analysis . . . . . . . . . . . 19 1.4.1 Visual Insights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.4.2 Supervised Insights . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.4.3 Multiview and Ensemble-Based Insights . . . . . . . . . . . . . . . . . 21 1.4.4 Validation-Based Insights . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.5 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 vii © 2014 by Taylor & Francis Group, LLC
  • 14. viii Contents 2 Feature Selection for Clustering: A Review 29 Salem Alelyani, Jiliang Tang, and Huan Liu 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.1.1 Data Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.1.2 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.1.3 Feature Selection for Clustering . . . . . . . . . . . . . . . . . . . . . . 33 2.1.3.1 Filter Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.1.3.2 Wrapper Model . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.1.3.3 Hybrid Model . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.2 Feature Selection for Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.2.1 Algorithms for Generic Data . . . . . . . . . . . . . . . . . . . . . . . 36 2.2.1.1 Spectral Feature Selection (SPEC) . . . . . . . . . . . . . . . 36 2.2.1.2 Laplacian Score (LS) . . . . . . . . . . . . . . . . . . . . . . 36 2.2.1.3 Feature Selection for Sparse Clustering . . . . . . . . . . . . 37 2.2.1.4 Localized Feature Selection Based on Scatter Separability (LFSBSS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.2.1.5 Multicluster Feature Selection (MCFS) . . . . . . . . . . . . 39 2.2.1.6 Feature Weighting k-Means . . . . . . . . . . . . . . . . . . . 40 2.2.2 Algorithms for Text Data . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.2.2.1 Term Frequency (TF) . . . . . . . . . . . . . . . . . . . . . . 41 2.2.2.2 Inverse Document Frequency (IDF) . . . . . . . . . . . . . . 42 2.2.2.3 Term Frequency-Inverse Document Frequency (TF-IDF) . . . 42 2.2.2.4 Chi Square Statistic . . . . . . . . . . . . . . . . . . . . . . . 42 2.2.2.5 Frequent Term-Based Text Clustering . . . . . . . . . . . . . 44 2.2.2.6 Frequent Term Sequence . . . . . . . . . . . . . . . . . . . . 45 2.2.3 Algorithms for Streaming Data . . . . . . . . . . . . . . . . . . . . . . 47 2.2.3.1 Text Stream Clustering Based on Adaptive Feature Selection (TSC-AFS) . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.2.3.2 High-Dimensional Projected Stream Clustering (HPStream) . 48 2.2.4 Algorithms for Linked Data . . . . . . . . . . . . . . . . . . . . . . . . 50 2.2.4.1 Challenges and Opportunities . . . . . . . . . . . . . . . . . . 50 2.2.4.2 LUFS: An Unsupervised Feature Selection Framework for Linked Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.2.4.3 Conclusion and Future Work for Linked Data . . . . . . . . . 52 2.3 Discussions and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.3.1 The Chicken or the Egg Dilemma . . . . . . . . . . . . . . . . . . . . . 53 2.3.2 Model Selection: K and l . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.3.3 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.3.4 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3 Probabilistic Models for Clustering 61 Hongbo Deng and Jiawei Han 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.2 Mixture Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.2.2 Gaussian Mixture Model . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.2.3 Bernoulli Mixture Model . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.2.4 Model Selection Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.3 EM Algorithm and Its Variations . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.3.1 The General EM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 69 3.3.2 Mixture Models Revisited . . . . . . . . . . . . . . . . . . . . . . . . . 73 © 2014 by Taylor & Francis Group, LLC
  • 15. Contents ix 3.3.3 Limitations of the EM Algorithm . . . . . . . . . . . . . . . . . . . . . 75 3.3.4 Applications of the EM Algorithm . . . . . . . . . . . . . . . . . . . . 76 3.4 Probabilistic Topic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.4.1 Probabilistic Latent Semantic Analysis . . . . . . . . . . . . . . . . . . 77 3.4.2 Latent Dirichlet Allocation . . . . . . . . . . . . . . . . . . . . . . . . 79 3.4.3 Variations and Extensions . . . . . . . . . . . . . . . . . . . . . . . . . 81 3.5 Conclusions and Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4 A Survey of Partitional and Hierarchical Clustering Algorithms 87 Chandan K. Reddy and Bhanukiran Vinzamuri 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.2 Partitional Clustering Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.2.1 K-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.2.2 Minimization of Sum of Squared Errors . . . . . . . . . . . . . . . . . . 90 4.2.3 Factors Affecting K-Means . . . . . . . . . . . . . . . . . . . . . . . . 91 4.2.3.1 Popular Initialization Methods . . . . . . . . . . . . . . . . . 91 4.2.3.2 Estimating the Number of Clusters . . . . . . . . . . . . . . . 92 4.2.4 Variations of K-Means . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.2.4.1 K-Medoids Clustering . . . . . . . . . . . . . . . . . . . . . 93 4.2.4.2 K-Medians Clustering . . . . . . . . . . . . . . . . . . . . . 94 4.2.4.3 K-Modes Clustering . . . . . . . . . . . . . . . . . . . . . . 94 4.2.4.4 Fuzzy K-Means Clustering . . . . . . . . . . . . . . . . . . . 95 4.2.4.5 X-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . 95 4.2.4.6 Intelligent K-Means Clustering . . . . . . . . . . . . . . . . . 96 4.2.4.7 Bisecting K-Means Clustering . . . . . . . . . . . . . . . . . 97 4.2.4.8 Kernel K-Means Clustering . . . . . . . . . . . . . . . . . . . 97 4.2.4.9 Mean Shift Clustering . . . . . . . . . . . . . . . . . . . . . . 98 4.2.4.10 Weighted K-Means Clustering . . . . . . . . . . . . . . . . . 98 4.2.4.11 Genetic K-Means Clustering . . . . . . . . . . . . . . . . . . 99 4.2.5 Making K-Means Faster . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.3 Hierarchical Clustering Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.3.1 Agglomerative Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.3.1.1 Single and Complete Link . . . . . . . . . . . . . . . . . . . 101 4.3.1.2 Group Averaged and Centroid Agglomerative Clustering . . . 102 4.3.1.3 Ward’s Criterion . . . . . . . . . . . . . . . . . . . . . . . . 103 4.3.1.4 Agglomerative Hierarchical Clustering Algorithm . . . . . . . 103 4.3.1.5 Lance–Williams Dissimilarity Update Formula . . . . . . . . 103 4.3.2 Divisive Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.3.2.1 Issues in Divisive Clustering . . . . . . . . . . . . . . . . . . 104 4.3.2.2 Divisive Hierarchical Clustering Algorithm . . . . . . . . . . 105 4.3.2.3 Minimum Spanning Tree-Based Clustering . . . . . . . . . . 105 4.3.3 Other Hierarchical Clustering Algorithms . . . . . . . . . . . . . . . . . 106 4.4 Discussion and Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5 Density-Based Clustering 111 Martin Ester 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.2 DBSCAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.3 DENCLUE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.4 OPTICS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.5 Other Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 © 2014 by Taylor & Francis Group, LLC
  • 16. x Contents 5.6 Subspace Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 5.7 Clustering Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.8 Other Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6 Grid-Based Clustering 127 Wei Cheng, Wei Wang, and Sandra Batista 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 6.2 The Classical Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 6.2.1 Earliest Approaches: GRIDCLUS and BANG . . . . . . . . . . . . . . 131 6.2.2 STING and STING+: The Statistical Information Grid Approach . . . . 132 6.2.3 WaveCluster: Wavelets in Grid-Based Clustering . . . . . . . . . . . . . 134 6.3 Adaptive Grid-Based Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 135 6.3.1 AMR: Adaptive Mesh Refinement Clustering . . . . . . . . . . . . . . . 135 6.4 Axis-Shifting Grid-Based Algorithms . . . . . . . . . . . . . . . . . . . . . . . 136 6.4.1 NSGC: New Shifting Grid Clustering Algorithm . . . . . . . . . . . . . 136 6.4.2 ADCC: Adaptable Deflect and Conquer Clustering . . . . . . . . . . . . 137 6.4.3 ASGC: Axis-Shifted Grid-Clustering . . . . . . . . . . . . . . . . . . . 137 6.4.4 GDILC: Grid-Based Density-IsoLine Clustering Algorithm . . . . . . . 138 6.5 High-Dimensional Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 6.5.1 CLIQUE: The Classical High-Dimensional Algorithm . . . . . . . . . . 139 6.5.2 Variants of CLIQUE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 6.5.2.1 ENCLUS: Entropy-Based Approach . . . . . . . . . . . . . . 140 6.5.2.2 MAFIA: Adaptive Grids in High Dimensions . . . . . . . . . 141 6.5.3 OptiGrid: Density-Based Optimal Grid Partitioning . . . . . . . . . . . 141 6.5.4 Variants of the OptiGrid Approach . . . . . . . . . . . . . . . . . . . . 143 6.5.4.1 O-Cluster: A Scalable Approach . . . . . . . . . . . . . . . . 143 6.5.4.2 CBF: Cell-Based Filtering . . . . . . . . . . . . . . . . . . . 144 6.6 Conclusions and Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 7 Nonnegative Matrix Factorizations for Clustering: A Survey 149 Tao Li and Chris Ding 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 7.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 7.1.2 NMF Formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 7.2 NMF for Clustering: Theoretical Foundations . . . . . . . . . . . . . . . . . . . 151 7.2.1 NMF and K-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . 151 7.2.2 NMF and Probabilistic Latent Semantic Indexing . . . . . . . . . . . . . 152 7.2.3 NMF and Kernel K-Means and Spectral Clustering . . . . . . . . . . . . 152 7.2.4 NMF Boundedness Theorem . . . . . . . . . . . . . . . . . . . . . . . 153 7.3 NMF Clustering Capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 7.3.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 7.3.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 7.4 NMF Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 7.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 7.4.2 Algorithm Development . . . . . . . . . . . . . . . . . . . . . . . . . . 155 7.4.3 Practical Issues in NMF Algorithms . . . . . . . . . . . . . . . . . . . . 156 7.4.3.1 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 7.4.3.2 Stopping Criteria . . . . . . . . . . . . . . . . . . . . . . . . 156 7.4.3.3 Objective Function vs. Clustering Performance . . . . . . . . 157 7.4.3.4 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 © 2014 by Taylor & Francis Group, LLC
  • 17. Contents xi 7.5 NMF Related Factorizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 7.6 NMF for Clustering: Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . 161 7.6.1 Co-Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 7.6.2 Semisupervised Clustering . . . . . . . . . . . . . . . . . . . . . . . . 162 7.6.3 Semisupervised Co-Clustering . . . . . . . . . . . . . . . . . . . . . . 162 7.6.4 Consensus Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 7.6.5 Graph Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 7.6.6 Other Clustering Extensions . . . . . . . . . . . . . . . . . . . . . . . . 164 7.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 8 Spectral Clustering 177 Jialu Liu and Jiawei Han 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 8.2 Similarity Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 8.3 Unnormalized Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 180 8.3.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 8.3.2 Unnormalized Graph Laplacian . . . . . . . . . . . . . . . . . . . . . . 180 8.3.3 Spectrum Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 8.3.4 Unnormalized Spectral Clustering Algorithm . . . . . . . . . . . . . . . 182 8.4 Normalized Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 8.4.1 Normalized Graph Laplacian . . . . . . . . . . . . . . . . . . . . . . . 183 8.4.2 Spectrum Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 8.4.3 Normalized Spectral Clustering Algorithm . . . . . . . . . . . . . . . . 184 8.5 Graph Cut View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 8.5.1 Ratio Cut Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 8.5.2 Normalized Cut Relaxation . . . . . . . . . . . . . . . . . . . . . . . . 187 8.6 Random Walks View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 8.7 Connection to Laplacian Eigenmap . . . . . . . . . . . . . . . . . . . . . . . . . 189 8.8 Connection to Kernel k-Means and Nonnegative Matrix Factorization . . . . . . 191 8.9 Large Scale Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 8.10 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 9 Clustering High-Dimensional Data 201 Arthur Zimek 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 9.2 The “Curse of Dimensionality” . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 9.2.1 Different Aspects of the “Curse” . . . . . . . . . . . . . . . . . . . . . 202 9.2.2 Consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 9.3 Clustering Tasks in Subspaces of High-Dimensional Data . . . . . . . . . . . . . 206 9.3.1 Categories of Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . 206 9.3.1.1 Axis-Parallel Subspaces . . . . . . . . . . . . . . . . . . . . 206 9.3.1.2 Arbitrarily Oriented Subspaces . . . . . . . . . . . . . . . . . 207 9.3.1.3 Special Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 207 9.3.2 Search Spaces for the Clustering Problem . . . . . . . . . . . . . . . . . 207 9.4 Fundamental Algorithmic Ideas . . . . . . . . . . . . . . . . . . . . . . . . . . 208 9.4.1 Clustering in Axis-Parallel Subspaces . . . . . . . . . . . . . . . . . . . 208 9.4.1.1 Cluster Model . . . . . . . . . . . . . . . . . . . . . . . . . . 208 9.4.1.2 Basic Techniques . . . . . . . . . . . . . . . . . . . . . . . . 208 9.4.1.3 Clustering Algorithms . . . . . . . . . . . . . . . . . . . . . 210 9.4.2 Clustering in Arbitrarily Oriented Subspaces . . . . . . . . . . . . . . . 215 9.4.2.1 Cluster Model . . . . . . . . . . . . . . . . . . . . . . . . . . 215 © 2014 by Taylor & Francis Group, LLC
  • 18. xii Contents 9.4.2.2 Basic Techniques and Example Algorithms . . . . . . . . . . 216 9.5 Open Questions and Current Research Directions . . . . . . . . . . . . . . . . . 218 9.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 10 A Survey of Stream Clustering Algorithms 231 Charu C. Aggarwal 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 10.2 Methods Based on Partitioning Representatives . . . . . . . . . . . . . . . . . . 233 10.2.1 The STREAM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 233 10.2.2 CluStream: The Microclustering Framework . . . . . . . . . . . . . . . 235 10.2.2.1 Microcluster Definition . . . . . . . . . . . . . . . . . . . . . 235 10.2.2.2 Pyramidal Time Frame . . . . . . . . . . . . . . . . . . . . . 236 10.2.2.3 Online Clustering with CluStream . . . . . . . . . . . . . . . 237 10.3 Density-Based Stream Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 239 10.3.1 DenStream: Density-Based Microclustering . . . . . . . . . . . . . . . 240 10.3.2 Grid-Based Streaming Algorithms . . . . . . . . . . . . . . . . . . . . 241 10.3.2.1 D-Stream Algorithm . . . . . . . . . . . . . . . . . . . . . . 241 10.3.2.2 Other Grid-Based Algorithms . . . . . . . . . . . . . . . . . 242 10.4 Probabilistic Streaming Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 243 10.5 Clustering High-Dimensional Streams . . . . . . . . . . . . . . . . . . . . . . . 243 10.5.1 The HPSTREAM Method . . . . . . . . . . . . . . . . . . . . . . . . . 244 10.5.2 Other High-Dimensional Streaming Algorithms . . . . . . . . . . . . . 244 10.6 Clustering Discrete and Categorical Streams . . . . . . . . . . . . . . . . . . . . 245 10.6.1 Clustering Binary Data Streams with k-Means . . . . . . . . . . . . . . 245 10.6.2 The StreamCluCD Algorithm . . . . . . . . . . . . . . . . . . . . . . . 245 10.6.3 Massive-Domain Clustering . . . . . . . . . . . . . . . . . . . . . . . . 246 10.7 Text Stream Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 10.8 Other Scenarios for Stream Clustering . . . . . . . . . . . . . . . . . . . . . . . 252 10.8.1 Clustering Uncertain Data Streams . . . . . . . . . . . . . . . . . . . . 253 10.8.2 Clustering Graph Streams . . . . . . . . . . . . . . . . . . . . . . . . . 253 10.8.3 Distributed Clustering of Data Streams . . . . . . . . . . . . . . . . . . 254 10.9 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 11 Big Data Clustering 259 Hanghang Tong and U Kang 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 11.2 One-Pass Clustering Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 260 11.2.1 CLARANS: Fighting with Exponential Search Space . . . . . . . . . . 260 11.2.2 BIRCH: Fighting with Limited Memory . . . . . . . . . . . . . . . . . 261 11.2.3 CURE: Fighting with the Irregular Clusters . . . . . . . . . . . . . . . . 263 11.3 Randomized Techniques for Clustering Algorithms . . . . . . . . . . . . . . . . 263 11.3.1 Locality-Preserving Projection . . . . . . . . . . . . . . . . . . . . . . 264 11.3.2 Global Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 11.4 Parallel and Distributed Clustering Algorithms . . . . . . . . . . . . . . . . . . . 268 11.4.1 General Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 11.4.2 DBDC: Density-Based Clustering . . . . . . . . . . . . . . . . . . . . . 269 11.4.3 ParMETIS: Graph Partitioning . . . . . . . . . . . . . . . . . . . . . . 269 11.4.4 PKMeans: K-Means with MapReduce . . . . . . . . . . . . . . . . . . 270 11.4.5 DisCo: Co-Clustering with MapReduce . . . . . . . . . . . . . . . . . . 271 11.4.6 BoW: Subspace Clustering with MapReduce . . . . . . . . . . . . . . . 272 11.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 © 2014 by Taylor & Francis Group, LLC
  • 19. Contents xiii 12 Clustering Categorical Data 277 Bill Andreopoulos 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 12.2 Goals of Categorical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 12.2.1 Clustering Road Map . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 12.3 Similarity Measures for Categorical Data . . . . . . . . . . . . . . . . . . . . . 282 12.3.1 The Hamming Distance in Categorical and Binary Data . . . . . . . . . 282 12.3.2 Probabilistic Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 12.3.3 Information-Theoretic Measures . . . . . . . . . . . . . . . . . . . . . 283 12.3.4 Context-Based Similarity Measures . . . . . . . . . . . . . . . . . . . . 284 12.4 Descriptions of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 12.4.1 Partition-Based Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 284 12.4.1.1 k-Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 12.4.1.2 k-Prototypes (Mixed Categorical and Numerical) . . . . . . . 285 12.4.1.3 Fuzzy k-Modes . . . . . . . . . . . . . . . . . . . . . . . . . 286 12.4.1.4 Squeezer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 12.4.1.5 COOLCAT . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 12.4.2 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 287 12.4.2.1 ROCK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 12.4.2.2 COBWEB . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 12.4.2.3 LIMBO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 12.4.3 Density-Based Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 289 12.4.3.1 Projected (Subspace) Clustering . . . . . . . . . . . . . . . . 290 12.4.3.2 CACTUS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 12.4.3.3 CLICKS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 12.4.3.4 STIRR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 12.4.3.5 CLOPE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 12.4.3.6 HIERDENC: Hierarchical Density-Based Clustering . . . . . 292 12.4.3.7 MULIC: Multiple Layer Incremental Clustering . . . . . . . . 293 12.4.4 Model-Based Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 296 12.4.4.1 BILCOM Empirical Bayesian (Mixed Categorical and Numer- ical) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296 12.4.4.2 AutoClass (Mixed Categorical and Numerical) . . . . . . . . 296 12.4.4.3 SVM Clustering (Mixed Categorical and Numerical) . . . . . 297 12.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298 13 Document Clustering: The Next Frontier 305 David C. Anastasiu, Andrea Tagarelli, and George Karypis 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 13.2 Modeling a Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 13.2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 13.2.2 The Vector Space Model . . . . . . . . . . . . . . . . . . . . . . . . . . 307 13.2.3 Alternate Document Models . . . . . . . . . . . . . . . . . . . . . . . . 309 13.2.4 Dimensionality Reduction for Text . . . . . . . . . . . . . . . . . . . . 309 13.2.5 Characterizing Extremes . . . . . . . . . . . . . . . . . . . . . . . . . . 310 13.3 General Purpose Document Clustering . . . . . . . . . . . . . . . . . . . . . . . 311 13.3.1 Similarity/Dissimilarity-Based Algorithms . . . . . . . . . . . . . . . . 311 13.3.2 Density-Based Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 312 13.3.3 Adjacency-Based Algorithms . . . . . . . . . . . . . . . . . . . . . . . 313 13.3.4 Generative Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 13.4 Clustering Long Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 © 2014 by Taylor & Francis Group, LLC
  • 20. xiv Contents 13.4.1 Document Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 315 13.4.2 Clustering Segmented Documents . . . . . . . . . . . . . . . . . . . . . 317 13.4.3 Simultaneous Segment Identification and Clustering . . . . . . . . . . . 321 13.5 Clustering Short Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 13.5.1 General Methods for Short Document Clustering . . . . . . . . . . . . . 323 13.5.2 Clustering with Knowledge Infusion . . . . . . . . . . . . . . . . . . . 324 13.5.3 Clustering Web Snippets . . . . . . . . . . . . . . . . . . . . . . . . . . 325 13.5.4 Clustering Microblogs . . . . . . . . . . . . . . . . . . . . . . . . . . . 326 13.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 14 Clustering Multimedia Data 339 Shen-Fu Tsai, Guo-Jun Qi, Shiyu Chang, Min-Hsuan Tsai, and Thomas S. Huang 14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340 14.2 Clustering with Image Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340 14.2.1 Visual Words Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 14.2.2 Face Clustering and Annotation . . . . . . . . . . . . . . . . . . . . . . 342 14.2.3 Photo Album Event Recognition . . . . . . . . . . . . . . . . . . . . . 343 14.2.4 Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344 14.2.5 Large-Scale Image Classification . . . . . . . . . . . . . . . . . . . . . 345 14.3 Clustering with Video and Audio Data . . . . . . . . . . . . . . . . . . . . . . . 347 14.3.1 Video Summarization . . . . . . . . . . . . . . . . . . . . . . . . . . . 348 14.3.2 Video Event Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 14.3.3 Video Story Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 350 14.3.4 Music Summarization . . . . . . . . . . . . . . . . . . . . . . . . . . . 350 14.4 Clustering with Multimodal Data . . . . . . . . . . . . . . . . . . . . . . . . . . 351 14.5 Summary and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . 353 15 Time-Series Data Clustering 357 Dimitrios Kotsakos, Goce Trajcevski, Dimitrios Gunopulos, and Charu C. Aggarwal 15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358 15.2 The Diverse Formulations for Time-Series Clustering . . . . . . . . . . . . . . . 359 15.3 Online Correlation-Based Clustering . . . . . . . . . . . . . . . . . . . . . . . . 360 15.3.1 Selective Muscles and Related Methods . . . . . . . . . . . . . . . . . . 361 15.3.2 Sensor Selection Algorithms for Correlation Clustering . . . . . . . . . 362 15.4 Similarity and Distance Measures . . . . . . . . . . . . . . . . . . . . . . . . . 363 15.4.1 Univariate Distance Measures . . . . . . . . . . . . . . . . . . . . . . . 363 15.4.1.1 Lp Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 15.4.1.2 Dynamic Time Warping Distance . . . . . . . . . . . . . . . 364 15.4.1.3 EDIT Distance . . . . . . . . . . . . . . . . . . . . . . . . . 365 15.4.1.4 Longest Common Subsequence . . . . . . . . . . . . . . . . 365 15.4.2 Multivariate Distance Measures . . . . . . . . . . . . . . . . . . . . . . 366 15.4.2.1 Multidimensional Lp Distance . . . . . . . . . . . . . . . . . 366 15.4.2.2 Multidimensional DTW . . . . . . . . . . . . . . . . . . . . . 367 15.4.2.3 Multidimensional LCSS . . . . . . . . . . . . . . . . . . . . 368 15.4.2.4 Multidimensional Edit Distance . . . . . . . . . . . . . . . . 368 15.4.2.5 Multidimensional Subsequence Matching . . . . . . . . . . . 368 15.5 Shape-Based Time-Series Clustering Techniques . . . . . . . . . . . . . . . . . 369 15.5.1 k-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370 15.5.2 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 371 15.5.3 Density-Based Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 372 © 2014 by Taylor & Francis Group, LLC
  • 21. Contents xv 15.5.4 Trajectory Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 372 15.6 Time-Series Clustering Applications . . . . . . . . . . . . . . . . . . . . . . . . 374 15.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 16 Clustering Biological Data 381 Chandan K. Reddy, Mohammad Al Hasan, and Mohammed J. Zaki 16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382 16.2 Clustering Microarray Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383 16.2.1 Proximity Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383 16.2.2 Categorization of Algorithms . . . . . . . . . . . . . . . . . . . . . . . 384 16.2.3 Standard Clustering Algorithms . . . . . . . . . . . . . . . . . . . . . . 385 16.2.3.1 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . 385 16.2.3.2 Probabilistic Clustering . . . . . . . . . . . . . . . . . . . . . 386 16.2.3.3 Graph-Theoretic Clustering . . . . . . . . . . . . . . . . . . . 386 16.2.3.4 Self-Organizing Maps . . . . . . . . . . . . . . . . . . . . . . 387 16.2.3.5 Other Clustering Methods . . . . . . . . . . . . . . . . . . . 387 16.2.4 Biclustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388 16.2.4.1 Types and Structures of Biclusters . . . . . . . . . . . . . . . 389 16.2.4.2 Biclustering Algorithms . . . . . . . . . . . . . . . . . . . . 390 16.2.4.3 Recent Developments . . . . . . . . . . . . . . . . . . . . . . 391 16.2.5 Triclustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391 16.2.6 Time-Series Gene Expression Data Clustering . . . . . . . . . . . . . . 392 16.2.7 Cluster Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 16.3 Clustering Biological Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 394 16.3.1 Characteristics of PPI Network Data . . . . . . . . . . . . . . . . . . . 394 16.3.2 Network Clustering Algorithms . . . . . . . . . . . . . . . . . . . . . . 394 16.3.2.1 Molecular Complex Detection . . . . . . . . . . . . . . . . . 394 16.3.2.2 Markov Clustering . . . . . . . . . . . . . . . . . . . . . . . 395 16.3.2.3 Neighborhood Search Methods . . . . . . . . . . . . . . . . . 395 16.3.2.4 Clique Percolation Method . . . . . . . . . . . . . . . . . . . 395 16.3.2.5 Ensemble Clustering . . . . . . . . . . . . . . . . . . . . . . 396 16.3.2.6 Other Clustering Methods . . . . . . . . . . . . . . . . . . . 396 16.3.3 Cluster Validation and Challenges . . . . . . . . . . . . . . . . . . . . . 397 16.4 Biological Sequence Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 397 16.4.1 Sequence Similarity Metrics . . . . . . . . . . . . . . . . . . . . . . . . 397 16.4.1.1 Alignment-Based Similarity . . . . . . . . . . . . . . . . . . 398 16.4.1.2 Keyword-Based Similarity . . . . . . . . . . . . . . . . . . . 398 16.4.1.3 Kernel-Based Similarity . . . . . . . . . . . . . . . . . . . . 399 16.4.1.4 Model-Based Similarity . . . . . . . . . . . . . . . . . . . . . 399 16.4.2 Sequence Clustering Algorithms . . . . . . . . . . . . . . . . . . . . . 399 16.4.2.1 Subsequence-Based Clustering . . . . . . . . . . . . . . . . . 399 16.4.2.2 Graph-Based Clustering . . . . . . . . . . . . . . . . . . . . 400 16.4.2.3 Probabilistic Models . . . . . . . . . . . . . . . . . . . . . . 402 16.4.2.4 Suffix Tree and Suffix Array-Based Method . . . . . . . . . . 403 16.5 Software Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 16.6 Discussion and Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405 © 2014 by Taylor & Francis Group, LLC
  • 22. xvi Contents 17 Network Clustering 415 Srinivasan Parthasarathy and S M Faisal 17.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416 17.2 Background and Nomenclature . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 17.3 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 17.4 Common Evaluation Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418 17.5 Partitioning with Geometric Information . . . . . . . . . . . . . . . . . . . . . . 419 17.5.1 Coordinate Bisection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419 17.5.2 Inertial Bisection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419 17.5.3 Geometric Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . 420 17.6 Graph Growing and Greedy Algorithms . . . . . . . . . . . . . . . . . . . . . . 421 17.6.1 Kernighan-Lin Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 422 17.7 Agglomerative and Divisive Clustering . . . . . . . . . . . . . . . . . . . . . . . 423 17.8 Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424 17.8.1 Similarity Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425 17.8.2 Types of Similarity Graphs . . . . . . . . . . . . . . . . . . . . . . . . 425 17.8.3 Graph Laplacians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426 17.8.3.1 Unnormalized Graph Laplacian . . . . . . . . . . . . . . . . 426 17.8.3.2 Normalized Graph Laplacians . . . . . . . . . . . . . . . . . 427 17.8.4 Spectral Clustering Algorithms . . . . . . . . . . . . . . . . . . . . . . 427 17.9 Markov Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428 17.9.1 Regularized MCL (RMCL): Improvement over MCL . . . . . . . . . . 429 17.10 Multilevel Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430 17.11 Local Partitioning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 432 17.12 Hypergraph Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433 17.13 Emerging Methods for Partitioning Special Graphs . . . . . . . . . . . . . . . . 435 17.13.1 Bipartite Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435 17.13.2 Dynamic Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436 17.13.3 Heterogeneous Networks . . . . . . . . . . . . . . . . . . . . . . . . . 437 17.13.4 Directed Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438 17.13.5 Combining Content and Relationship Information . . . . . . . . . . . . 439 17.13.6 Networks with Overlapping Communities . . . . . . . . . . . . . . . . 440 17.13.7 Probabilistic Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 442 17.14 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443 18 A Survey of Uncertain Data Clustering Algorithms 457 Charu C. Aggarwal 18.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457 18.2 Mixture Model Clustering of Uncertain Data . . . . . . . . . . . . . . . . . . . . 459 18.3 Density-Based Clustering Algorithms . . . . . . . . . . . . . . . . . . . . . . . 460 18.3.1 FDBSCAN Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 460 18.3.2 FOPTICS Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461 18.4 Partitional Clustering Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 462 18.4.1 The UK-Means Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 462 18.4.2 The CK-Means Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 463 18.4.3 Clustering Uncertain Data with Voronoi Diagrams . . . . . . . . . . . . 464 18.4.4 Approximation Algorithms for Clustering Uncertain Data . . . . . . . . 464 18.4.5 Speeding Up Distance Computations . . . . . . . . . . . . . . . . . . . 465 18.5 Clustering Uncertain Data Streams . . . . . . . . . . . . . . . . . . . . . . . . . 466 18.5.1 The UMicro Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 466 18.5.2 The LuMicro Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 471 © 2014 by Taylor & Francis Group, LLC
  • 23. Contents xvii 18.5.3 Enhancements to Stream Clustering . . . . . . . . . . . . . . . . . . . . 471 18.6 Clustering Uncertain Data in High Dimensionality . . . . . . . . . . . . . . . . . 472 18.6.1 Subspace Clustering of Uncertain Data . . . . . . . . . . . . . . . . . . 473 18.6.2 UPStream: Projected Clustering of Uncertain Data Streams . . . . . . . 474 18.7 Clustering with the Possible Worlds Model . . . . . . . . . . . . . . . . . . . . 477 18.8 Clustering Uncertain Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478 18.9 Conclusions and Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478 19 Concepts of Visual and Interactive Clustering 483 Alexander Hinneburg 19.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483 19.2 Direct Visual and Interactive Clustering . . . . . . . . . . . . . . . . . . . . . . 484 19.2.1 Scatterplots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485 19.2.2 Parallel Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488 19.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 19.3 Visual Interactive Steering of Clustering . . . . . . . . . . . . . . . . . . . . . . 491 19.3.1 Visual Assessment of Convergence of Clustering Algorithm . . . . . . . 491 19.3.2 Interactive Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . 492 19.3.3 Visual Clustering with SOMs . . . . . . . . . . . . . . . . . . . . . . . 494 19.3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494 19.4 Interactive Comparison and Combination of Clusterings . . . . . . . . . . . . . . 495 19.4.1 Space of Clusterings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495 19.4.2 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497 19.4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497 19.5 Visualization of Clusters for Sense-Making . . . . . . . . . . . . . . . . . . . . 497 19.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500 20 Semisupervised Clustering 505 Amrudin Agovic and Arindam Banerjee 20.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506 20.2 Clustering with Pointwise and Pairwise Semisupervision . . . . . . . . . . . . . 507 20.2.1 Semisupervised Clustering Based on Seeding . . . . . . . . . . . . . . . 507 20.2.2 Semisupervised Clustering Based on Pairwise Constraints . . . . . . . . 508 20.2.3 Active Learning for Semisupervised Clustering . . . . . . . . . . . . . . 511 20.2.4 Semisupervised Clustering Based on User Feedback . . . . . . . . . . . 512 20.2.5 Semisupervised Clustering Based on Nonnegative Matrix Factorization . 513 20.3 Semisupervised Graph Cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 20.3.1 Semisupervised Unnormalized Cut . . . . . . . . . . . . . . . . . . . . 515 20.3.2 Semisupervised Ratio Cut . . . . . . . . . . . . . . . . . . . . . . . . . 515 20.3.3 Semisupervised Normalized Cut . . . . . . . . . . . . . . . . . . . . . . 516 20.4 A Unified View of Label Propagation . . . . . . . . . . . . . . . . . . . . . . . 517 20.4.1 Generalized Label Propagation . . . . . . . . . . . . . . . . . . . . . . 517 20.4.2 Gaussian Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517 20.4.3 Tikhonov Regularization (TIKREG) . . . . . . . . . . . . . . . . . . . 518 20.4.4 Local and Global Consistency . . . . . . . . . . . . . . . . . . . . . . . 518 20.4.5 Related Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519 20.4.5.1 Cluster Kernels . . . . . . . . . . . . . . . . . . . . . . . . . 519 20.4.5.2 Gaussian Random Walks EM (GWEM) . . . . . . . . . . . . 519 20.4.5.3 Linear Neighborhood Propagation . . . . . . . . . . . . . . . 520 20.4.6 Label Propagation and Green’s Function . . . . . . . . . . . . . . . . . 521 20.4.7 Label Propagation and Semisupervised Graph Cuts . . . . . . . . . . . . 521 © 2014 by Taylor & Francis Group, LLC
  • 24. xviii Contents 20.5 Semisupervised Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521 20.5.1 Nonlinear Manifold Embedding . . . . . . . . . . . . . . . . . . . . . . 522 20.5.2 Semisupervised Embedding . . . . . . . . . . . . . . . . . . . . . . . . 522 20.5.2.1 Unconstrained Semisupervised Embedding . . . . . . . . . . 523 20.5.2.2 Constrained Semisupervised Embedding . . . . . . . . . . . . 523 20.6 Comparative Experimental Analysis . . . . . . . . . . . . . . . . . . . . . . . . 524 20.6.1 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 524 20.6.2 Semisupervised Embedding Methods . . . . . . . . . . . . . . . . . . . 529 20.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 530 21 Alternative Clustering Analysis: A Review 535 James Bailey 21.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535 21.2 Technical Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537 21.3 Multiple Clustering Analysis Using Alternative Clusterings . . . . . . . . . . . . 538 21.3.1 Alternative Clustering Algorithms: A Taxonomy . . . . . . . . . . . . . 538 21.3.2 Unguided Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 539 21.3.2.1 Naive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539 21.3.2.2 Meta Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 539 21.3.2.3 Eigenvectors of the Laplacian Matrix . . . . . . . . . . . . . . 540 21.3.2.4 Decorrelated k-Means and Convolutional EM . . . . . . . . . 540 21.3.2.5 CAMI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540 21.3.3 Guided Generation with Constraints . . . . . . . . . . . . . . . . . . . . 541 21.3.3.1 COALA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541 21.3.3.2 Constrained Optimization Approach . . . . . . . . . . . . . . 541 21.3.3.3 MAXIMUS . . . . . . . . . . . . . . . . . . . . . . . . . . . 542 21.3.4 Orthogonal Transformation Approaches . . . . . . . . . . . . . . . . . 543 21.3.4.1 Orthogonal Views . . . . . . . . . . . . . . . . . . . . . . . . 543 21.3.4.2 ADFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543 21.3.5 Information Theoretic . . . . . . . . . . . . . . . . . . . . . . . . . . . 544 21.3.5.1 Conditional Information Bottleneck (CIB) . . . . . . . . . . . 544 21.3.5.2 Conditional Ensemble Clustering . . . . . . . . . . . . . . . . 544 21.3.5.3 NACI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544 21.3.5.4 mSC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545 21.4 Connections to Multiview Clustering and Subspace Clustering . . . . . . . . . . 545 21.5 Future Research Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547 21.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547 22 Cluster Ensembles: Theory and Applications 551 Joydeep Ghosh and Ayan Acharya 22.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551 22.2 The Cluster Ensemble Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 554 22.3 Measuring Similarity Between Clustering Solutions . . . . . . . . . . . . . . . . 555 22.4 Cluster Ensemble Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558 22.4.1 Probabilistic Approaches to Cluster Ensembles . . . . . . . . . . . . . . 558 22.4.1.1 A Mixture Model for Cluster Ensembles (MMCE) . . . . . . 558 22.4.1.2 Bayesian Cluster Ensembles (BCE) . . . . . . . . . . . . . . 558 22.4.1.3 Nonparametric Bayesian Cluster Ensembles (NPBCE) . . . . 559 22.4.2 Pairwise Similarity-Based Approaches . . . . . . . . . . . . . . . . . . 560 22.4.2.1 Methods Based on Ensemble Co-Association Matrix . . . . . 560 © 2014 by Taylor & Francis Group, LLC
  • 25. Contents xix 22.4.2.2 Relating Consensus Clustering to Other Optimization Formu- lations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562 22.4.3 Direct Approaches Using Cluster Labels . . . . . . . . . . . . . . . . . 562 22.4.3.1 Graph Partitioning . . . . . . . . . . . . . . . . . . . . . . . 562 22.4.3.2 Cumulative Voting . . . . . . . . . . . . . . . . . . . . . . . 563 22.5 Applications of Consensus Clustering . . . . . . . . . . . . . . . . . . . . . . . 564 22.5.1 Gene Expression Data Analysis . . . . . . . . . . . . . . . . . . . . . . 564 22.5.2 Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564 22.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566 23 Clustering Validation Measures 571 Hui Xiong and Zhongmou Li 23.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572 23.2 External Clustering Validation Measures . . . . . . . . . . . . . . . . . . . . . . 573 23.2.1 An Overview of External Clustering Validation Measures . . . . . . . . 574 23.2.2 Defective Validation Measures . . . . . . . . . . . . . . . . . . . . . . 575 23.2.2.1 K-Means: The Uniform Effect . . . . . . . . . . . . . . . . . 575 23.2.2.2 A Necessary Selection Criterion . . . . . . . . . . . . . . . . 576 23.2.2.3 The Cluster Validation Results . . . . . . . . . . . . . . . . . 576 23.2.2.4 The Issues with the Defective Measures . . . . . . . . . . . . 577 23.2.2.5 Improving the Defective Measures . . . . . . . . . . . . . . . 577 23.2.3 Measure Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . 577 23.2.3.1 Normalizing the Measures . . . . . . . . . . . . . . . . . . . 578 23.2.3.2 The DCV Criterion . . . . . . . . . . . . . . . . . . . . . . . 581 23.2.3.3 The Effect of Normalization . . . . . . . . . . . . . . . . . . 583 23.2.4 Measure Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584 23.2.4.1 The Consistency Between Measures . . . . . . . . . . . . . . 584 23.2.4.2 Properties of Measures . . . . . . . . . . . . . . . . . . . . . 586 23.2.4.3 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . 589 23.3 Internal Clustering Validation Measures . . . . . . . . . . . . . . . . . . . . . . 589 23.3.1 An Overview of Internal Clustering Validation Measures . . . . . . . . . 589 23.3.2 Understanding of Internal Clustering Validation Measures . . . . . . . . 592 23.3.2.1 The Impact of Monotonicity . . . . . . . . . . . . . . . . . . 592 23.3.2.2 The Impact of Noise . . . . . . . . . . . . . . . . . . . . . . 593 23.3.2.3 The Impact of Density . . . . . . . . . . . . . . . . . . . . . 594 23.3.2.4 The Impact of Subclusters . . . . . . . . . . . . . . . . . . . 595 23.3.2.5 The Impact of Skewed Distributions . . . . . . . . . . . . . . 596 23.3.2.6 The Impact of Arbitrary Shapes . . . . . . . . . . . . . . . . 598 23.3.3 Properties of Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 600 23.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601 24 Educational and Software Resources for Data Clustering 607 Charu C. Aggarwal and Chandan K. Reddy 24.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607 24.2 Educational Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 608 24.2.1 Books on Data Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 608 24.2.2 Popular Survey Papers on Data Clustering . . . . . . . . . . . . . . . . 608 24.3 Software for Data Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 610 24.3.1 Free and Open-Source Software . . . . . . . . . . . . . . . . . . . . . . 610 24.3.1.1 General Clustering Software . . . . . . . . . . . . . . . . . . 610 24.3.1.2 Specialized Clustering Software . . . . . . . . . . . . . . . . 610 © 2014 by Taylor & Francis Group, LLC
  • 26. Other documents randomly have different content
  • 30. The Project Gutenberg eBook of The Good Englishwoman
  • 31. This ebook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this ebook or online at www.gutenberg.org. If you are not located in the United States, you will have to check the laws of the country where you are located before using this eBook. Title: The Good Englishwoman Author: Orlo Williams Release date: October 6, 2018 [eBook #58041] Language: English Credits: Produced by MFR, Les Galloway and the Online Distributed Proofreading Team at http://www.pgdp.net (This file was produced from images generously made available by The Internet Archive/American Libraries.) *** START OF THE PROJECT GUTENBERG EBOOK THE GOOD ENGLISHWOMAN ***
  • 32. Transcriber’s Notes Obvious typographical errors have been silently corrected. Variations in hyphenation have been standardised but all other spelling and punctuation remains unchanged. The half title immediately before the title page has been omitted.
  • 33. THE GOOD ENGLISHWOMAN BY ORLO WILLIAMS, M.C. Author of “Vie de Boheme: A Patch of Romantic Paris,” “The Life and Letters of John Rickman,” etc. LONDON GRANT RICHARDS LTD. ST MARTIN’S STREET MDCCCCXX PRINTED IN GREAT BRITAIN BY THE DUNEDIN PRESS LIMITED, EDINBURGH
  • 34. TO BETTY WHEN SHE IS OLDER WITH THE SUPERFLUOUS INJUNCTION NOT TO TAKE THIS BOOK TOO SERIOUSLY
  • 36. CONTENTS CHAPTER PAGE I.The Man in the Sidecar 9 II.Little Girls 29 III.Big Girls 51 IV.The English Wife 76 V.The English Mother 102 VI.The Englishwoman’s Mind 128 VII.The Englishwoman’s Manners 145 VIII.The Englishwoman and the Arts 166 IX.The Englishwoman in Society 187 X.The Englishwoman at Work 204 XI.The Englishwoman at Play 219 XII.The Englishwoman in Parliament234
  • 38. CHAPTER I A FEW REMARKS FROM THE MAN IN THE SIDECAR My uncle Joseph, a solitary man, once broke the silence of a country walk by asserting with explosive emphasis: “I don’t see how any man can understand women.” I assented vaguely, and he went on: “How can we ever grasp their point of view, my dear boy, which is so totally different from ours? How can we understand the outlook on life of beings whose instincts, training, purpose, ambitions have so little resemblance to ours? For my part I have given up trying: it is a waste of time. Never let a woman flatter you into thinking that you understand her: she is trying to make you her tool. The Egyptians gave the Sphinx a woman’s face and they were right. Women are so mysterious.” And the south-west wind took up his words and whispered them to the trees, which nodded their heads and waved their branches, rustling “mysterious, mysterious” in all their leaves. I do not argue with my uncle Joseph, especially on a country walk when the south-west wind is blowing. So I took out my pipe and lit it in spite of the south-west wind, saying to myself: “You silly wind, you silly trees, you know nothing of wisdom. You would catch up anything that my uncle Joseph said and make it seem important.” And the south-west wind solemnly breathed “important” into the ear of a little quarry, in the tone of a ripe family butler. “There is just as much, and just as little, mystery about men and women as there is about you. It depends how much one wants to know. So far as there is any mystery, as a matter of fact, it is much more on the side of men, who are far more incalculable, far more complex than women in their motives and reactions. But men are lazy, you silly old things,
  • 39. and it saves a lot of trouble to invent a mystery and give it up rather than sit down before a problem to study it. Men have thousands of other things to think about besides women, but women, who have not the same variety, are so devilish insistent, that they would keep men thinking about them all their time if they could. So, in self- defence, men have pacified the dear things by calling them mysterious, which is highly flattering, and by giving them up for three-quarters of their days. Uncle Joseph has probably been arguing unsuccessfully with Aunt Georgiana, as he always will, because he never took the trouble to master her mental and emotional processes. But that does not prove the general truth of his proposition. His is just the mind which grows those weeds of everyday thought the seeds of which thoughtless south-west winds blow about as they do the seeds of thistles. Go off and blow those clouds away, you reverberator of commonplaces.” Throwing up his hands with a shriek of “commonplaces,” the wind flew up over the hill ruffling its hair as he passed. I think I was quite right not to answer my uncle Joseph and to rebuke the south-west wind. People are so tiresomely fond of uttering generalisations which they do not really believe and on which they never act. It is surely no less foolish to say that women are complete mysteries than to say that one understands them perfectly. Every individual understands a few men and a few women, or life would be impossible. Besides, understanding has its degrees which approach, but never reach, perfection. Samuel Butler somewhere says that the process of love could only be logically concluded by eating the loved one—a coarse way of saying that perfect love would end in complete assimilation: it is the same with the relation of knowledge. Happily love between human beings of opposite sexes can exist without being pushed to this voracious conclusion: so can understanding. It may be true that women have quicker intuitions than men, though only over a limited range of subjects: but men, on the other hand, are more widely and studiously observant, besides being far
  • 40. more interested in the attainment of truth as the result of observation. Patient induction is, after all, an excellent substitute for brilliant guessing. Women would be extremely disappointed if men really acted on the “mystery” theory and took to thinking or writing as little about woman as the majority think or write about the problem of existence. Nothing, however, will prevent men from talking and thinking about women, and a glance at any bookshelf will prove that they do not always do so in complete ignorance of their subject. Balzac, who was no magician, was not entirely beside the mark in creating the Duchesse de Maufrigneuse, and Lady Teazle is a recognizable being. George Meredith’s Diana seems to have human substance: Mr Shaw’s Anne in “Man and Superman” and Mr Wells’ Anne Veronica, though founded on masculine observations, are admitted by women to be reasonable creations. The laziness of men, I repeat, and the vanity of women are responsible for the legend of woman’s inviolable mystery. The laws of gravitation were a mystery till Newton used his observation: the mystery still remains, but the experiments of Newton and other physicists has driven it further back. So it is with the human soul. Each one is a mystery, but observation and familiarity can penetrate a number of its veils, leaving only some of the intimate recesses unexplored, and even these recesses are threatened with exposure as our knowledge of telepathy and of the subconscious elements increases. There are certain experiences of women which a man cannot share, certain aspirations and fears at whose poignancy he can only guess, certain instinctive impulses of which he is not directly conscious: but he can surmount the barriers in some measure by the use of his eyes and ears. If, therefore, he choose to record what his eyes and ears tell him, he is not exceeding the limits of masculine capacity. My uncle Joseph could hardly deplore so unpretentious a line of approach. A mere man may be content to leave Miss Dorothy Richardson and Miss May Sinclair delving gloomily in the jungles of feminine psychology where he would fear to follow them, and yet feel that, without presumption, he may hold some views about his natural complement. The question is what views are right and what
  • 41. are wrong. The war has changed many things, and man’s views about his natural complement among them. Most people, with that useful faculty of oblivion for which we thank Providence, have forgotten what they thought in 1914: if there were such a thing as a mental gramophone which could record their thoughts of five years ago, they would be extremely surprised. Things that seemed absurd then have now been taken for granted, and it is possible that many things taken for granted then may be shown to have become absurd. It has certainly become ridiculous to speak of the “weaker sex,” except in a strictly muscular sense. Women have revealed capacities for organisation and disciplined effort in large bodies, especially in this country, for which the epithet “surprising” is but feeble. Has this fact alone not caused a revolution of ideas? If we have not all accepted it yet, we shall all soon have to accept the principle that, in all but purely physical exertion, men and women have equal potential abilities. The potential ability of women is still in need of development, for they are starting some centuries behind the men, but the inevitable result will be the recognition of “equal opportunity.” To what sociological crisis this may lead, I do not know, and as this is not a sociological treatise, I need not prophesy: but it is an element that must count heavily in any review of old ideas. Another element which must count is the franchise, which will, of course, be extended in the near future till there is no inequality between the sexes in this respect. Women are political beings with vast possibilities of becoming a political force. They will play a more and more important part in the history of the nation. They will dance a new dance in the ballet of humanity. That recently so familiar figure in a short skirt of khaki and close-fitting cap, seated firmly but not too gracefully astride a motor bicycle rushing with its side-car, and often its male passenger, through the traffic is more than a phenomenon, it is a symbol. The air has whipped her cheeks pink and blown loose a stray lock above her determined eyes. What beauties she has of form or feature are none of them hid. She is all the woman that the world has known, but with a new purpose and a new poise. For good or ill she has entered the machine, and we
  • 42. came to look on her with an indifferent and familiar eye. But what will she do, what will she think, whither will she carry us in that side- car of hers? To all her ancient qualities she has added a new one: object of desire, mother of children, guardian of the hearth, mate of man or virgin saint, she has now another manifestation, that of fellow-combatant; some say, also of adversary. One might almost say that, bending over the handle-bars of her machine, with her body curved and her legs planted firmly on the footboard she mimes the very mark of interrogation which her changes of social posture present. A living query in khaki, she is a challenge to the prophet and the philosopher. One who is neither will let the challenge pass, sure only of one thing—that develop as she may and carry us where she will, the tradition of the good Englishwoman is safe in her keeping. “The good Englishwoman,” an untranslatable phrase—I beseech our French neighbours not to translate it la bonne anglaise—is an expression which has a corresponding reality. We all know it, in our flesh, in our bones, in our minds and in our souls. The Englishwoman is a definite person to all of us in England: she is not merely the female of the species living in these isles, she has a significance in the world at large. We love her and we honour her, but we do not often reflect what it is that we love and honour. It is a mental occupation which might be more frequently indulged in, were we not such indifferent reflectors. The ingenious Henry Adams, that enlightened but pensive American, whose death has just given us one of the most fascinating books of modern times, spent his whole life in reflecting on his countrymen, with results which are stimulating if not encouraging. He did not spend so much time reflecting on his countrywomen, though he said that he owed more to them than to any man, but his reflections on that head resolved themselves into a question which no Englishman would formulate in similar circumstances. Henry Adams used to invite agreeable and witty people to dine,1 and, at an unexpected moment, to propound to the “brightest” of the women the question: “Why is the American woman a failure?” He meant a failure as a force rather than as an
  • 43. individual, but it was an irritating question all the same, nor is it surprising that it usually drew the answer: “Because the American man is a failure.” The Englishman would be too chivalrous to ask such a question of his guests, but he would not even formulate it. The Englishman, even a considerably sophisticated one, could never think of the Englishwoman as a failure, whether as an individual, a force or an inspiration. He is bound by his experience, his upbringing and his instincts to think of her as a success. Let us then put the question “Why is the Englishwoman a success?” We shall get no very good impromptu answers, nor do I suggest that “Because the Englishman is a success” would be the correct one. We should be the last to take so much credit to ourselves. We are justly proud of the Englishwoman, but what is it of which we are proud? Of all the approving epithets that have been applied to women, which do we choose for our own? Is our pride in their beauty, their brilliance, their courage, their wit, their tact, their energy, their endurance, their sagacity, their skill in handicraft, their devotion to their young, their taste in art and dress, their grace of movement, the sweetness of their speech or the greatness of their minds? Are they only an attraction or an independent force? Are they better mistresses or mothers? When Henry Adams lived in this country as a young man he found that "Englishwomen, from the educational point of view, could give nothing until they approached forty years old. Then they become very interesting—very charming—to the man of fifty." What do we say to such a criticism from so acute a mind? It is easier to ask questions than to answer them, and I propose to shirk the harder part of the task. Questions cannot be satisfactorily answered for other people, and, where everyone has to make up his or her mind, the mere asking of questions is in itself an aid to their solution. Each reader will answer the questions I have asked in a different way: having done so, he must pass to another consideration. We are proud of the Englishwoman, but we criticise her, again each one of us differently. We must consider the grounds of our criticism. She dresses badly, some will say; her hair is always untidy, say others; foreigners assert that she is proud and stupid;
  • 44. Englishmen, secretly glad that she is proud, try to forget that she is poorly educated. That she walks gracefully, none will say, but as an athlete she is second to none: it would be rash to say that her taste in the home is remarkable, but the atmosphere of home, which not even the most hideous decoration can kill nor the most beautiful create, emanates from her alone. As a housewife she has her glories and her failings. She has not the almost brutish industry of the German nor the avaricious acuteness of the French bourgeoise; she is, in general, neither expert in household industry nor in business. Nevertheless, the Englishman is only really contented in a household presided over and served by Englishwomen, and that is not only because they understand his wants, but because they are genial and simple, neither servile nor imperious, good comrades who do not expect too little or exact too much. Fearless in her actions, the Englishwoman is timid in her ideas: what she may do in the future is incalculable, her possibilities are unbounded; but there seem to be limits to the expansion, except by imitation, of her power of thought. As an administrator she will find no superior, but the political thinkers, as well as the artists, will for the most part come from other nations. These are but random criticisms which, among others, will occur to any mind that reflects upon the subject. They show, once more, that the essence of the Englishwoman or of her goodness is not a simple one. She is therefore an excellent topic for a conversation that should be provocative and stimulating. If I sustain one part, the reader will mentally sustain the other. Let us continue it. It is hardly necessary to say that any criticism of the Englishwoman in these pages is not an attack upon her: nor is any approbation to be considered a defence. At least I pay this much respect to my uncle Joseph that no woman shall flatter me into defending her: she is more than capable of doing this for herself. But, beyond this, I quite fail to understand what a friend of mine meant when he suggested that I should write in defence of women. “Against whom or against what?” I asked, but his explanation was not lucid. I gathered that he had in mind the complaint sometimes
  • 45. heard that women have ceased to be women in order to become inferior men; that they are getting hard and conceited; that they turn up their noses at the domestic virtues, at marriage and the whole conception of life as duty, and that they think only of having “a good time.” The isolated instances given as grounds for this complaint are, I am convinced, not typical. That women have developed and broken through the far too narrow restrictions of a hundred years ago is only a matter for thankfulness: something is always lost in every adjustment, but more is gained if the adjustment is natural. The flighty girl whom most grumblers of this kind have in mind is only a fraction, and a very imperfect fraction, of the Englishwoman. A far more serious line was taken by Henry Adams towards the end of his life, when he became finally convinced that he was a man of the eighteenth century living in an unfamiliar world whose guiding forces he could not fathom. Musing over the enormous mass of new forces put into the hand of man by the end of the nineteenth century, he wondered what should be the result of so much energy turned over to the use of women, according to the scientific notions of force. He could not write down the equation. The picture of the world that he saw was of man bending eagerly over the steering wheel of a rushing motor car too intent on keeping up a high speed and avoiding accidents to have leisure for any distractions. The old attraction of the woman, one of the most powerful forces of the past, had become a distraction, and woman, no longer able to inspire men, had been forced to follow them. Woman had been set free: as travellers, typists, telephone girls, factory hands, they moved untrammelled in the world. But in what direction were they moving? After the men, said Henry Adams; discarding all the qualities for which men had no longer any interest or pleasure, they too were bending over the steering wheel in the same rapid career. Woman the rebel was now free and there was only one thing left for her to rebel against, maternity, or the inertia of sex, to speak in terms of force. Inertia of sex, the philosopher truly remarked, could not be overcome without extinguishing the race, yet an immense force was working irresistibly to overcome it. What would happen? Henry Adams gave up the riddle, grateful for
  • 46. the illusion that woman alone of all the species was unable to change. Superficial observers might say that this movement has been accelerated by the war. Hundreds of homes have loosened their ties in the stress of war, thousands of unrebellious daughters have left their narrow walls at the call of patriotism and are now unwilling to return to them. They have learnt to live in the herd with their own sex, and prefer it to living with their own sex in the pen; physical danger and discomfort are no longer bogeys to frighten them; they have been “on their own,” and “on their own” they intend to stay. All very true, no doubt, with the added complication of serious competition between the sexes in a restricted labour market. At the same time, these superficial observers forget that there has been an extraordinary return to the traditional relations between men and women during the war. The inspiration of the woman has never been stronger; once more, after many years, men have fought for their women and the women have regarded their champions with gratitude; women have tended and worked for men in greater numbers and with greater alacrity than ever before in the history of the world; the comradeship between the sexes has grown warmer and stronger without destroying the still more natural relation, for marriage as an institution has enjoyed a season of abnormal popularity. In a country at war, especially in a country invaded, men and women return to the relations of extreme antiquity; the men fight to protect the home and the family, which they alone can do. If they are beaten, the home is destroyed and the women are ravished. We in England have escaped this last simplification: we have been lucky, but we have lost the directness of the lesson. Nevertheless, it is patent enough to thoughtful people. War has revealed men and women pretty much as they always have been, and the revelation will not be forgotten. The apprehensions of a Henry Adams, after the five years of war, do, in fact, appear to be exaggerated. The futility of all that vast array of mechanical force which so appalled him has
  • 47. Welcome to our website – the ideal destination for book lovers and knowledge seekers. With a mission to inspire endlessly, we offer a vast collection of books, ranging from classic literary works to specialized publications, self-development books, and children's literature. Each book is a new journey of discovery, expanding knowledge and enriching the soul of the reade Our website is not just a platform for buying books, but a bridge connecting readers to the timeless values of culture and wisdom. With an elegant, user-friendly interface and an intelligent search system, we are committed to providing a quick and convenient shopping experience. Additionally, our special promotions and home delivery services ensure that you save time and fully enjoy the joy of reading. Let us accompany you on the journey of exploring knowledge and personal growth! ebookultra.com