Semi-supervised projected model-based clustering

Guerra Velasco, Luis Pelayo, Bielza Lozoya, María Concepción ORCID: https://orcid.org/0000-0001-7109-2668, Robles Forcada, Víctor ORCID: https://orcid.org/0000-0003-3937-2269 and Larrañaga Múgica, Pedro María ORCID: https://orcid.org/0000-0003-0652-9872 (2014). Semi-supervised projected model-based clustering. "Data Mining and Knowledge Discovery", v. 28 (n. 4); pp. 882-917. ISSN 1942-4795. https://doi.org/10.1007/s10618-013-0323-0.

Descripción

Título: Semi-supervised projected model-based clustering
Autor/es:
Tipo de Documento: Artículo
Título de Revista/Publicación: Data Mining and Knowledge Discovery
Fecha: 2014
ISSN: 1942-4795
Volumen: 28
Número: 4
Materias:
ODS:
Palabras Clave Informales: Clustering, Subspaces, Semi-supervised, Model-based, Partially labeled data
Escuela: E.T.S. de Ingenieros Informáticos (UPM)
Departamento: Inteligencia Artificial
Licencias Creative Commons: Reconocimiento - Sin obra derivada - No comercial

Texto completo

[thumbnail of LARRANAGA_2014_03_b.pdf] PDF (Portable Document Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (1MB)

Resumen

We present an adaptation of model-based clustering for partially labeled data, that is capable of finding hidden cluster labels. All the originally known and discoverable clusters are represented using localized feature subset selections (subspaces), obtaining clusters unable to be discovered by global feature subset selection. The semi-supervised projected model-based clustering algorithm (SeSProC) also includes a novel model selection approach, using a greedy forward search to estimate the final number of clusters. The quality of SeSProC is assessed using synthetic data, demonstrating its effectiveness, under different data conditions, not only at classifying instances with known labels, but also at discovering completely hidden clusters in different subspaces. Besides, SeSProC also outperforms three related baseline algorithms in most scenarios using synthetic and real data sets.