SlideShare a Scribd company logo
TELKOMNIKA Telecommunication, Computing, Electronics and Control
Vol. 18, No. 4, August 2020, pp. 1884∼1891
ISSN: 1693-6930, accredited First Grade by Kemenristekdikti, No: 21/E/KPT/2018
DOI: 10.12928/TELKOMNIKA.v18i4.13858 Ì 1884
Spatial association discovery process using frequent
subgraph mining
Giovanni Dai´an Rottoli1
, Hern´an Merlino2
1
Universidad Nacional de La Plata, Argentina
1
Universidad Tecnologica Nacional, Argentina
1,2
Information Systems Research Group, National University of Lan´us, Buenos Aires
Article Info
Article history:
Received Aug 10, 2019
Revised Mar 10, 2020
Accepted Apr 3, 2020
Keywords:
Frequent subgraph mining
SARM
Spatial association mining
Spatial data mining
Spatial knowledge discovery
ABSTRACT
Spatial associations are one of the most relevant kinds of patterns used by business
intelligence regarding spatial data. Due to the characteristics of this particular type of
information, different approaches have been proposed for spatial association mining.
This wide variety of methods has entailed the need for a process to integrate the ac-
tivities for association discovery, one that is easy to implement and flexible enough to
be adapted to any particular situation, particularly for small and medium-size projects
to guide the useful pattern discovery process. Thus, this work proposes an adaptable
knowledge discovery process that uses graph theory to model different spatial rela-
tionships from multiple scenarios, and frequent subgraph mining to discover spatial
associations. A proof of concept is presented using real data.
This is an open access article under the CC BY-SA license.
Corresponding Author:
Giovanni Dai´an Rottoli,
Departamento de Ingenier´ıa en Sistemas de Informaci´on,
Universidad Tecnol´ogica Nacional, F.R. Concepci´on del Uruguay,
676 Ing. Pereira Street, Concepci´ıon del Uruguay (3260), Entre R´ıos, Argentina.
Email: rottolig@frcu.utn.edu.ar
1. INTRODUCTION
Spatial knowledge discovery aims to find useful and novel patterns in spatial datasets to support
decision-making in a particular problem domain [1]. Among all the possible patterns to discover, spatial asso-
ciations are one of the most commonly used today in multiple fields such as climatology, geography, geology,
criminology and ecology, among many others. They are comprised of predicates that involve spatial objects
along with spatial and non-spatial relationships between those objects [2]. There are many challenges associ-
ated with the characteristics of spatial data that make this data mining task more complicated, such as the spatial
dependency data attributes, the multiplicity of spatial data representation models, the spatial relations between
data objects and some particular spatial properties such as spatial autocorrelation and spatial heterogeneity [3].
Multiple algorithms have been developed for association pattern mining that can be used. Each of
these algorithms, in general, aims to solve particular concerns about the aforemetioned challenges. The se-
lection of a proper algorithm has become an arduous activity due to the growing number of new alternatives
and their variants, specially to inexperienced users. Thus, it is necessary to provide a new process for small or
medium-size application domains, one that is easy to implement and flexible enough to be adapted to multiple
contexts. Consequently, this paper proposes a new process for association mining discovery from spatial data
Journal homepage: http://journal.uad.ac.id/index.php/TELKOMNIKA
TELKOMNIKA Telecommun Comput El Control Ì 1885
that utilizes graph theory to model spatial objects and the relations between them and frequent subgraph mining
to find the substructures with a high repetition rate inside the general graph. These substructures correspond to
association patterns. The proposal is a new alternative to model complex situations from a particular problem
domain, but not replace or improve results from the algorithms in the state-of-the-art, however it provides a
road map to initially address a problem. The rest of this paper is arranged as follows: section 2. on the charac-
teristics of spatial data; section 3. contains association patterns and their characteristics regarding spatial data;
section 4. includes the proposed process for discovery of spatial associations; a proof of concept using real
world data is shown in section 5. Lastly, section 6. contains conclusions and future works .
2. SPATIAL DATA
Spatial data is a particular type of dependent data. Formally, a spatial database D is a set of spatial
records D = {T1, T2, · · · , Td} with Ti = {S1
i , S2
i , · · · , Sm
i , X1
i , X2
i , · · · , Xn
i }, where each Sk
i is a spatial
attribute that stores values about the spatial contexts, and each Xl
i is a non-spatial attributes with values mea-
sured at particular locations [3, 4]. The non-spatial attributes may be numerical or categorical according to
the problem domain and the spatial attributes may be specified as coordinates or places (e.g. city name or
state code). Additionally, there are three basic types of spatial objects: points, used to model specific punctual
locations in the space; lines, used to model linear extensions such as rivers or roads; and polygons, used to
represent objects that have a two-dimensional extension in the space, such as regions or states.
The dependence of non-spatial attributes on spatial ones means that different implicit spatial relations
can be extracted from data. Let D be a spatial database, a relation R ⊆ D2
is called spatial if and only if it is
defined through a binary predicate P(x, y)|x, y ∈ D that involves the spatial attributes from the spatial objects
x and y. For example, the spatial relation N ⊆ D2
, with x, y ∈ D, defined by the predicate shown in (2), is
the neighborhood relations between two spatial points using euclidean distance: xNy ⇐⇒ Dist(x, y) < λ ,
λ ∈ R+
These relations can be classified as geometric, if they are related to the principles of euclidean geom-
etry (e.g. neighbouring relationships); directional, when they refer to relative spatial orientations (e.g. above,
below, north, east); topological, if they are independent from the concepts of distance and direction and are
not affected by spatial transformations such as rotation or translation (e.g. intersect, inside), or hybrid, if they
are related to two or more of the aforementioned types of properties. These relationships can be calculated
using different methods depending on the problem domain and the class of spatial data used: points, lines or
polygons [5, 6].
On the other hand, two properties are derived from spatial dependence: spatial autocorrelation, i.e.,
observations of spatially distributed random variables are not location-independent, and spatial heterogeneity,
i.e., patterns found in some region of the space may not have the same support in other region. Spatial auto-
correlation refers to the particularity of spatial data to not be distributed independently throughout the space.
The distribution depends on the characteristics of the data points, the characteristics of the underlying space or
the spatial neighboring relationships. For example, churches tends to be located near public squares or animal
tends to travel to locations that contain their food sources [7]. Spatial heterogeneity is related to spatial auto-
correlation. This phenomenon describes the local nature of spatial patterns, which are subordinated to some
specific locations. Thus, a spatial pattern, such as association rules, may have a high support value in a region
and a low support value in a different one. This phenomenon is also known as Simpson’s paradox [8]. All these
particular characteristics make knowledge extraction from spatial data become a complex activity which not
only has to consider patterns between data records, but also the implicit relationships between spatial objects.
3. SPATIAL ASSOCIATIONS
One of the most common patterns to find in data is the association pattern. An association pattern
P is defined as an n-ary predicate P = (p1, p2, · · · , pn) with a high probability of occurrence in the dataset.
Its classic application is the supermarket basket analysis to discover whether or not there is some correlation
between items that are bought together. An association pattern is referred to as spatial if at least one of its
atomic predicates pk involves a spatial relationship between its variables [2]. For example, in a city C, churches
and public squares tend to be neighbors: City(C) ∧ Church(X) ∧ PublicSquare(Y ) ∧Inside(X, C) ∧
Inside(Y, C) ∧ Neighbors(X, Y )
As shown in the previous example, Inside(X, C), Inside(Y, C) and Neighbors(X, Y ) are spatial
Spatial association discovery process using frequent subgraph mining (Giovanni Dai´an Rottoli)
1886 Ì ISSN: 1693-6930
predicates related to topological and geometric relationships. Many different relations must be taken into
consideration at the same time to find useful spatial associations. Also, these relations must be calculated in
local contexts, due to the aforementioned Simpson’s Paradox.
Multiple efforts have been made in order to find spatial association patterns in spatial databases: [7]
proposes a method for spatial association mining that consider spatial autocorrelation by using a cell structure;
[9] focuses on the problem of rule extraction from spatial data with crisp condition attributes and fuzzy deci-
sions. A rough-fuzzy set based rule extraction model is used to deal with both fuzziness and roughness; [10]
combines and extend techniques developed in both spatial and fuzzy data mining to deal with the uncertainty
found in typical spatial data. This proposal uses fuzzy logic to get relevant information from transition areas
between spatial neighborhoods to spatial association mining and for spatial relationships modelling; [11, 12]
propose an algorithm for local patterns discovery considering spatial heterogeneity that incorporates a novel
spatial metric for support evaluation based on event density in a particular area; [13] presents a specially de-
signed algorithm to discover spatial associations related to El Ni˜no Southern Oscillation (ENSO); [14] applies
an algorithm that explores multiple spatial objects hierarchies; [15] uses A-Priori-based approaches to find
spatial association rules; [6, 16] propose using Inductive Logic Programming (ILP) for reach this data mining
purpose by modelling and stracting high support spatial relations from spatial data. [17] worked with meta-
heuristics such as genetic algorithms and evolutionary programming; [18] suggested a data-transformation
approach before using traditional association rule mining algorithms; [19] introduced non-trivial structures
such as graphs for spatial relationship representation; among others.
Because of this variety of spatial data mining approaches for association discovery, it is difficult to
select a proper algorithm or method to be used in small knowledge discovery application contexts. Because of
this, a unified and general process is required to deal with the aforementioned problems and it has to be flexible
enough to be adapted to multiple particular situations and easy to implement.
4. SPATIAL ASSOCIATION DISCOVERY PROCESS
This work describes a new process for spatial association extraction considering the possibility of
having multiple relationships between spatial objects of any kind (i.e. points, lines, polygons), and considering
the spatial autocorrelation and spatial heterogeneity. This process is designed as a first approach to get spatial
association knowledge from data in particular contexts easy to implement in small or medium-size projects.
The process Figure 1 is divided into 5 main steps: data preparation (section 4.1.), neighborhood definition
(section 4.2.), modelling of spatial relationships using graphs (section 4.3.), frequent subgraph mining (section
4.4.) and evaluation of results (section 4.5.).
Figure 1. Spatial association discovery process
4.1. Data preparation
The proposed process starts with a spatial data preparation step. It is necessary to codify the various
spatial datasets obtained from different sources in different formats, in order to enable the extraction of relations
between all the data instances in later steps. In general terms, it is not uncommon to have multiple spatial objects
layers, each of them with a particular representation type and related to a particular scenario from the problem
domain. On the other hand, two types of datasets must be considered: target datasets, with objects directly
TELKOMNIKA Telecommun Comput El Control, Vol. 18, No. 4, August 2020 : 1884 – 1891
TELKOMNIKA Telecommun Comput El Control Ì 1887
related to the problem domain that are going to be present in every association pattern, and relevant datasets,
that may or may not be related to the target datasets, but add important information that may be useful for
decision making [20].
These data must be prepared by cleaning errors, solving inconsistent and null values, and dealing with
outliers. New attributes or even new data objects could be generated using the input data. This step requires
considerable effort and may require many iterations. Thus, it is advisable to implement the process using a
proper methodology such as CRISP-DM [21].
4.2. Neighborhood definition
As mentioned before, a particular spatial association pattern may have a higher occurrence probability
in some regions and lower probability in others [8]. For this reason it is preferred to search for this kind of
pattern locally. For this, we propose defining partitions of the dataset, called neighborhoods in this context, and
the subsequent execution of the association pattern search algorithm on each of them.
These neighborhoods can be defined beforehand using knowledge from to the problem domain, or
using spatial clustering techniques. Using density-based or distance-based spatial clustering algorithms [22–
24] is suggested due to the First Law of Geography, which states that spatial objects located together are more
closely related than those that are far away from each other [25, 26]. Nonetheless, there is an issue to consider
in this step: the limits between neighborhoods may add important information for spatial association mining.
Thus, the use of fuzzy clustering techniques or flexible boundaries models may be desirable.
4.3. Modelling of spatial relationships using graphs
Now, we have to calculate the spatial relations between the target data instances and the instances
of the relevant dataset from each neighborhood. Depending on the problem domain, different types of spatial
relations can be calculated: euclidian, topological, directional or hybrid relationships, as mentioned above [6].
This might be a step with a high computational cost.
Graph theory is proposed to model the spatial relationships due to its close relation with first order
logic and the pattern to find [16]. Graphs are discrete structures consisting of vertices and edges that connect
these vertices. There are different kinds of graphs, depending on whether edges have directions (digraphs),
whether multiple edges can connect the same pair of vertices (multigraphs), and whether loops are allowed.
Formally, a simple graph G = (V, E) consists of V, a nonempty set of vertices (or nodes) and E, a set
of edges. Each edge has two vertices associated with it, called its endpoints. An edge is said to connect its
endpoints. To relate each edge to its endpoints, a function φ : E → {v1 ∈ V, v2 ∈ V }, called incidence
function, is used. A multigraph, on the other hand, is a graph where multiple edges can exist associated with
the same endpoints. Additionally, each vertex and each edge can be labeled with data related to the represented
object. This structure can be adapted to multiple scenarios and multiple efficient algorithms can be used to
extract valuable information such as maximum cliques [27].
In the context of this work, multigraphs are used to model spatial objects as vertices and the relations
between them as edges. A small example can be seen in Figure 2 (a). Two sets of labels and two extra
functions to asign those labels to the vertices and edges are needed. So, let G be a multigraph without loops
G = (V, E, L, K, φ,l ,k ) where: V is the vertex set of G, which corresponds to the spatial objects from the
datasets; E is the edge set that corresponds to each calculated relationship between the spatial objects; L is the
vertex label set with the characteristics of the spatial data objects; K is the edge label set, with the characteristics
of each spatial relation; φ : E → {x ∈ P(V )/|x| ≤ 2} is the incidence function; l ⊆ V × L and k ⊆ E × K
are labeling relations.
The aforementioned structure makes it possible to model multiple different relationships with the
same endpoints labeled with different attributes. Also, many attributes of spatial objects could be taken into
consideration. Additionally, it must be noted that loops (i.e. edges with only one endpoint) are not considered
because their lack of semantics in this context (there are not spatial relationships that involves only one spatial
object). Fuzzy logic could also be a valuable tool to model the spatial relationships, if the situation requires it
[10]. More information about fuzzy logic this can be found in [28]
4.4. Frequent subgraph mining
To extract spatial associations with a high probability of occurrence, frequent subgraph mining is pro-
posed to be used for each modeled graph. Given a multigraph G = (V, E, L, K, φ,l ,k ) like the one described in
the previous section, the frequent subgraph mining problem in a single multigraph is finding recurring subgraph
Spatial association discovery process using frequent subgraph mining (Giovanni Dai´an Rottoli)
1888 Ì ISSN: 1693-6930
Gi ⊂ G, or in other words, a subgraph that has multiple instances in the original graph Figure 2 (b). It must
be noted that two graphs are isomorphic if all of their vertices and edges are shared including its labels.These
frequent subgraphs represent the relationships between spatial object types that take place in the space with a
high occurrence probability.
Multiple algorithms have been designed for frequent subgraph mining in a single big graph, calcu-
lating the relevance of a pattern in different ways. Some well-known examples of this are IncGM+, FSSG,
SUBDUE, among others [29, 30]. A set of frequent subgraphs for each neighborhood is obtained as a result of
this step and must be analyzed to obtain useful knowledge for decision-making.
4.5. Evaluation of results
In the final step, frequent subgraphs translated into n-ary predicates that represent trivial information
(non-novel patterns) must be filtered. The support and confidence measures can be extracted, selecting the
metrics that the desicion-maker consider to be more appropiate. This activity could be performed automatically
or manually by an analyst with knowledge about the problem domain with help from an expert.
PublicSquareX1
CityA
ChurchY1
Include
Include
Neighbors
(a) (b)
Figure 2. (a) Simplified example of spatial relationship modelling using a simple graph. (b) Example of
frequent subgraph (bottom) found in a simple graph without labels in the edges.
5. PROOF OF CONCEPT
The proof of concept presented in this section is intended to show how the proposed process works,
implemented by different programming and data mining tools. The data used in this example consists of 10
data files containing the location of facilities in Buenos Aires (Argentina) and its surroundings. These facilities
include libraries(74), clinics(63), post offices(55), sports halls(50), nightclubs(41), schools(107), gas stations
(97), churches (125), museums(37) or police stations (93).
For each of them, in the preparation step of the proposed process, the data files were integrated into
a single data file of spatial points using QGis (http://qgis.org/). Each spatial point is comprised of two spatial
attributes, Latitude and Longitude, and one non-spatial attribute, the type of building from the previous list.
After that, only the points that are located outside Buenos Aires limits were filtered to reduce the search space,
leaving 742 spatial points Figure 3 (a), (orange). Then, in the neighborhood definition step, the HDDBSCAN
clustering algorithm [31] from the ’dbscan’ library from R programming language was used on the spatial
data attributes to generate two neighborhoods with a minimum number of points equal to 50 in each of them
Figure 3 (a), (blue). Only two neighborhoods were used because of explanatory purposes.
In the next step, for each of the generated neighborhoods, a geometric relationship between their data
points was extracted forming a graph with vertices labeled with the type of facility related to each data point
and edges labeled with the sentence ”close to” if the adjacent points were less than 150 meters away from
each other (this value was selected for illustrative purposes only). Thus, two graphs were created: one with 71
vertices and 45 edges in neighborhood 1, and another with 15 vertices and 11 edges in neighborhood 2.
To obtain the frequent subgraphs of each of the generated graphs, SUBDUE algorithm was used
via its implementation in Subdue Graph Miner Software, using the compression rate as support measure.
TELKOMNIKA Telecommun Comput El Control, Vol. 18, No. 4, August 2020 : 1884 – 1891
TELKOMNIKA Telecommun Comput El Control Ì 1889
The result was a subgraph as shown in Figure 3 (b), with a compression rate of 15.5% in neighborhood
1, which was translated into the predicate Post office(x1) ∧ Nightclub(x2) ∧ Close to(x1, x2) and two sub-
graphs in neighborhood 2 , both with a compression rate of 27.2% that was translated into the predicates
Clinic(x1) ∧ Post office(x2) ∧ Close to(x1, x2) Post office(x1) ∧ Sport hall(x2) ∧ Close to(x1, x2)
(a) (b)
Figure 3. (a) Spatial neighborhoods generated for the proof of concept using HDBSCAN algorithm; (b)
Results of the proof of concept.
5.1. Discussion
The contributions of the proposed process are, firstly, the possibility of adapting it to multiple scenar-
ios, due to its flexible underlying structure being based on graphs. Some of the aforementioned methods use
flexible structures too [6, 16] but the complexity of these methods increases because of the use of techniques
based on Logic Programming. On the other hand, some other methods do not take into account complex pat-
terns [19]. Furthermore, the possibility of including valuable information related to the data objects and the
spatial relations by using labels in the graph representation is also considered. Generally, the data structures
involved do not take into account complex data associated to the spatial relations between spatial data.
In relation to the above, the proposed process considers spatial phenomena such as autocorrelation and
heterogeneity, by using spatial neighborhoods. Some alternatives such as [7] considering spatial autocorrelation
but not considering spatial heterogeneity or complex data relationships. In most of the cases studied, these
characteristics are present due to their relevance in data mining.
Also, related to this, the proposed process allows its implementation by using existing tools such
as frequent subgraph mining algorithms and clustering algorithms. Some of the state-of-the-art alternatives
include very flexible and powerful strategies, but implementation is hard, making them not suitable for appli-
cation in small or medium size projects [6, 9, 16, 19]. Lastly, the high adaptability of the procedure is a desired
characteristic due to the possibility of selecting among many algorithms for the implementation of each step.
Usually, the state-of-the-art methods propose a single alternative for its execution.
6. CONCLUSION
This work describes a knowledge discovery process called for extraction of spatial associations.
The process is flexible enough to take into account multiple and varied spatial relationships between spatial
objects of any kind, using a graph structure to model them. Heterogeneity and autocorrelation phenomena are
also considered, defining neighborhoods where the search process is performed to find this class of regularity.
The solution was designed to initially approach to this data mining task without worrying too much about par-
ticular characteristics of data mining algorithms. In a large-scale project, this process could guide the selection
of specific methods based on the results obtained in first iterations of an incremental methodology. A proof
of concept is presented as well, using real data to illustrate how the process is implemented using different
programming and data mining tools in each of the proposed steps.
Spatial association discovery process using frequent subgraph mining (Giovanni Dai´an Rottoli)
1890 Ì ISSN: 1693-6930
In future works, the research will be focused on implementation strategies according to the problem
domain for each of the steps of the process, in order to decrease computational execution time when dealing
with large amounts of spatial objects and spatial relationships. Also, fuzzy methods will be considered for
relation modelling and neighborhood definition.
ACKNOWLEDGEMENT
The research presented in this paper was partially funded by the PhD Scholarship Program to reinforce
R&D&I areas (2016-2020) of the Universidad Tecnol´ogica Nacional and the Research Project 80020160400001
LA of National University of Lan´us. The authors also want to extend their gratitude to Kevin-Mark Bozell
Poudereux, for proofreading the translation.
REFERENCES
[1] R. Garcia-Martinez, P. Britos, and D. Rodriguez, “Information mining processes based on intelligent sys-
tems,” International Conference on Industrial, Engineering and Other Applications of Applied Intelligent
Systems, pp. 402-410, 2013.
[2] K. Koperski and J. Han, “Discovery of spatial association rules in geographic information databases,”
International Symposium on Spatial Databases, pp. 47–66, 1995.
[3] Y. Leung et al., ”Knowledge discovery in spatial data,” Springer, 2010.
[4] C. C. Aggarwal, ”Data mining: The textbook,” Springer, 2015.
[5] R. Agrawal, et al., “Fast algorithms for mining association rules,” Proc. 20th int. conf. very large data
bases, vol. 1215, pp. 487-499, 1994.
[6] A. Appice, M. Ceci, A. Lanza, F. A. Lisi, and D. Malerba, “Discovery of spatial association rules in
geo-referenced census data: A relational mining approach,” Intelligent Data Analysis, vol. 7, no. 6, pp.
541-566, 2003.
[7] J. Chen, “An algorithm about association rule mining based on spatial autocorrelation,” The International
Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. 37, no. B6b,
pp. 99-106, 2008.
[8] E. H. Simpson, “The interpretation of interaction in contingency tables,” Journal of the Royal Statistical
Society. Series B (Methodological), vol. 13, no. 2, pp. 238–241, 1951.
[9] H. Bai, Y. Ge, J. Wang, D. Li, Y. Liao, and X. Zheng, “A method for extracting rules from spatial data
based on rough fuzzy sets,” Knowledge-Based Systems, vol. 57, pp. 28-40, 2014.
[10] R. Ladner, F. E. Petry, and M. A. Cobb, “Fuzzy set approaches to spatial data mining of association rules,”
Transactions in GIS, vol. 7, no. 1, pp. 123-138, 2003.
[11] Z. Sha and X. Li, “Mining local association patterns from spatial dataset,” Seventh International Confer-
ence on Fuzzy Systems and Knowledge Discovery (FSKD), vol. 3, pp. 1455-1460, 2010.
[12] Z. Sha, X. Tan, and Y. Bai, “Localized spatial association: A case study for understanding vegetation
successions in a typical grassland ecosystem,” Geo-Informatics in Resource Management and Sustainable
Ecosystem, pp. 33-45, 2015.
[13] X. Cunjin and L. Xiaohan, “Novel algorithm for mining ENSO-oriented marine spatial association pat-
terns from raster-formatted datasets,” ISPRS International Journal of Geo-Information, vol. 6, no. 5,
pp. 1-15, 2017.
[14] A.Salleband and C.Vrain, “Anapplication of association rules discovery to geographic information sys-
tems,” European Conference on Principles of Data, pp. 613-618, 2000.
[15] S. S. U. Sutjipto, I. S. Sitanggang, and B. Barus, “Potential usage estimation of ground water using spatial
association rule mining,” TELKOMNIKA Telecommunication, Computing, Electronics and Control, vol.
15, no. 1, pp. 504-511, 2017.
[16] D. Malerba, F. Esposito, F. A. Lisi, and A. Appice, “Mining spatial association rules in census data,”
Research in Official Statistics, vol. 5 no. 1, pp. 19-44, 2003.
[17] A. H. Goudarzi and N. Ghadiri, “A hybrid spatial data mining approach based on fuzzy topological rela-
tions and moses evolutionary algorithm,” Artificial Intelligence, Cornell University 2017.
[18] I. Lee, “Mining multivariate associations within gis environments,” International Conference on Indus-
trial, Engineering and Other Applications of Applied Intelligent Systems, pp. 1062-1071, 2004.
TELKOMNIKA Telecommun Comput El Control, Vol. 18, No. 4, August 2020 : 1884 – 1891
TELKOMNIKA Telecommun Comput El Control Ì 1891
[19] H. Yang, S. Parthasarathy, and S. Mehta, “Mining spatial object associations for scientific data,” II Inter-
national Joint Conference on Artificial Intelligence (IJCAI), pp. 902-907, 2005.
[20] V. Bogorny, P. M. Engel, and L. O. Alvares, “Geoarm: an interoperable framework to improve geographic
data preprocessing and spatial association rule mining.” SEKE, pp. 79-84, 2006.
[21] R. Wirth and J. Hipp, “Crisp-dm: Towards a standard process model for data mining,” Proceedings of the
4th international conference on the practical applications of knowledge discovery and data mining, pp.
29-39, 2000.
[22] J. Sander,et al., “Density-based clustering in spatial databases: The algorithm gdbscan and its applica-
tions,” Data mining and knowledge discovery, vol. 2, no. 2, pp. 169-194, 1998.
[23] Y. Zhu, K. M. Ting, and M. J. Carman, “Density-ratio based clustering for discovering clusters with
varying densities,” Pattern Recognition, vol. 60, pp. 983-997, 2016.
[24] A. Sharma, R. Gupta, and A. Tiwari, “Improved density based spatial clustering of applications of noise
clustering algorithm for knowledge discovery in spatial data,” Mathematical Problems in Engineering,
vol. 2016, 2016.
[25] W. R. Tobler, “Cellular geography,” Philosophy in geography, pp. 379-386, 1979.
[26] J. Duan, L. Wang, and X. Hu, “The effect of spatial autocorrelation on spatial co-location pattern mining,”
International Conference on Computer, Information and Telecommunication Systems, pp. 210-214, 2017.
[27] G. D. Rottoli, H. Merlino, and R. Garc ıa-Martinez, “Co-location rules discovery process focused on ref-
erence spatial features using decision tree learning,” International Conference on Industrial, Engineering
and Other Applications of Applied Intelligent Systems, pp. 221-226, 2017.
[28] D. J. Dubois, ”Fuzzy sets and systems: theory and applications,” Academic press, vol. 144, 1980.
[29] E. Abdelhamid, M. Canim, M. Sadoghi, B. Bhattacharjee, Y. Chang, and P. Kalnis, “Incremental frequent
subgraph mining on large evolving graphs,” IEEE Transactions on Knowledge and Data Engineering, vol.
29, no. 12, pp. 2710-2723, 2017.
[30] D. Kavitha, V. Kamakshi, and J. Murthy, “Finding frequent subgraphs in a single graph based on symme-
try,” International Journal of Computer Applications, vol, vol. 146, no. 11, pp. 0975-8887 2016.
[31] R. J. Campello, D. Moulavi, A. Zimek, and J. Sander, “Hierarchical density estimates for data clustering,
visualization, and outlier detection,” ACM Transactions on Knowledge Discovery from Data (TKDD),
vol. 10, no. 1, pp. 5, 2015.
BIOGRAPHIES OF AUTHORS
Giovanni Dai´an Rottoli is a researcher at the Computational Intelligence and Software Engineering
Research Group (GIICIS) from the National University of Technology (Argentina). He holds a Bach-
elor´s Degree in Information Systems from the aforementioned university (2015). He is currently a
Ph.D. candidate in Computer Science at the National University of La Plata (Argentina). He works
as an associate professor of Discrete Mathematics and Data Science at the National University of
Technology (Argentina).His research is focused in the fields of spatial data mining and knowledge
discovery, artificial intelligence and search-based software engineering. Further info can be found on
his profile: http://www.frcu.utn.edu.ar/giicis/rottolig/
Hern´an Merlino is the head of the Advanced Information Systems Laboratory at Buenos Aires Uni-
versity (Argentina) and the head of the Artificial Intelligence Laboratory at the National University
of Lan´us (Argentina). He is a fellow of the Gas and Petroleum Institute at Buenos Aires University.
He holds a Bachelor’s Degree in Information Systems from the University of Belgrano (Argentina), a
Master’s Degree in Software Engineering from the Computer Science Department of the Polytechnic
University of Madrid (Spain), and a Ph.D. in Information Sciences from the National University of
La Plata (Argentina). He works as a tenured full professor in graduate and postgraduate courses at
Buenos Aires University, Austral University and National University of Lan´us. His research inter-
ests are: artificial intelligence, data mining, and blockchain technologies. In the professional field,
he works as a Scientific Research Director in an Artificial Intelligence, Data Science, Blockchain &
Smart Contracts company in Argentina.
Spatial association discovery process using frequent subgraph mining (Giovanni Dai´an Rottoli)

More Related Content

What's hot (19)

PDF
Related work
MuhammetSubasi
 
PPTX
Introduction to spatial data mining
Hoang Nguyen
 
PDF
call for papers, research paper publishing, where to publish research paper, ...
International Journal of Engineering Inventions www.ijeijournal.com
 
PDF
Learning in non stationary environments
Springer
 
PDF
Texture classification based on overlapped texton co occurrence matrix (otcom...
eSAT Journals
 
PDF
Searching in metric spaces
unyil96
 
PDF
Semi-Supervised Discriminant Analysis Based On Data Structure
iosrjce
 
PDF
Ranking Preferences to Data by Using R-Trees
IOSR Journals
 
PPT
Ranking spatial data by quality preferences ppt
Saurav Kumar
 
PDF
Research Inventy : International Journal of Engineering and Science
inventy
 
PDF
COLOCATION MINING IN UNCERTAIN DATA SETS: A PROBABILISTIC APPROACH
IJCI JOURNAL
 
PPTX
CAA 2019 Krakow - When Harris met Allen in The Matrix: How can the conceptual...
Keith.May
 
PDF
Space-Time in the Matrix and Uses of Allen Temporal Operators for Stratigraph...
Keith.May
 
PDF
On nonmetric similarity search problems in complex domains
unyil96
 
PPT
5.1 major analytical techniques
md Siraj
 
PDF
Searching in high dimensional spaces index structures for improving the perfo...
unyil96
 
PPT
Rankingspatialdatabyqualitypreferences ppt
Sravani Sharma
 
PDF
The matrix ahrc_leadership_fellow_project_feb2020
Keith.May
 
PDF
Node similarity
SURAJ NAYAK
 
Related work
MuhammetSubasi
 
Introduction to spatial data mining
Hoang Nguyen
 
call for papers, research paper publishing, where to publish research paper, ...
International Journal of Engineering Inventions www.ijeijournal.com
 
Learning in non stationary environments
Springer
 
Texture classification based on overlapped texton co occurrence matrix (otcom...
eSAT Journals
 
Searching in metric spaces
unyil96
 
Semi-Supervised Discriminant Analysis Based On Data Structure
iosrjce
 
Ranking Preferences to Data by Using R-Trees
IOSR Journals
 
Ranking spatial data by quality preferences ppt
Saurav Kumar
 
Research Inventy : International Journal of Engineering and Science
inventy
 
COLOCATION MINING IN UNCERTAIN DATA SETS: A PROBABILISTIC APPROACH
IJCI JOURNAL
 
CAA 2019 Krakow - When Harris met Allen in The Matrix: How can the conceptual...
Keith.May
 
Space-Time in the Matrix and Uses of Allen Temporal Operators for Stratigraph...
Keith.May
 
On nonmetric similarity search problems in complex domains
unyil96
 
5.1 major analytical techniques
md Siraj
 
Searching in high dimensional spaces index structures for improving the perfo...
unyil96
 
Rankingspatialdatabyqualitypreferences ppt
Sravani Sharma
 
The matrix ahrc_leadership_fellow_project_feb2020
Keith.May
 
Node similarity
SURAJ NAYAK
 

Similar to Spatial association discovery process using frequent subgraph mining (20)

PPT
dm_spdm_short.ppt
SahilShahPhD2020
 
PPT
dm_spdm_short (3).ppt
yakot2alordea2
 
PPT
Spatial data mining
MITS Gwalior
 
PDF
17 manjula aakunuri final_paper--185-190
Alexander Decker
 
PDF
Chapter7.Revised
Achmad Solichin
 
PDF
International Journal of Engineering Research and Development
IJERD Editor
 
PDF
Applying association rules and co location techniques on geospatial web services
Alexander Decker
 
PPTX
Spatial databases
Seraphic Nazir
 
PPTX
Spatial Data Mining
Rashmi Bhat
 
PDF
PREDICTION OF STORM DISASTER USING CLOUD MAP-REDUCE METHOD
AM Publications
 
PDF
APPLICATION OF SPATIOTEMPORAL ASSOCIATION RULES ON SOLAR DATA TO SUPPORT SPAC...
IJDKP
 
PDF
Application of Spatiotemporal Association Rules on Solar Data to Support Spac...
IJDKP
 
PDF
A hybrid approach for analysis of dynamic changes in spatial data
IJDMS
 
PPTX
Graph Mining, Graph Patterns, Social Network, Set & List Valued Attribute, Sp...
Amar Myana
 
PDF
Performing Fast Spatial Query Search by Using Ultimate Code Words
BRNSSPublicationHubI
 
PDF
Spatial co location pattern mining
Seung Kwan Kim
 
PPT
4.2 spatial data mining
Krish_ver2
 
PDF
13584 27 multimedia mining
Universitas Bina Darma Palembang
 
PDF
08. Mining Type Of Complex Data
Achmad Solichin
 
PDF
unitiv-spacialdataanalysis-200423132043.pdf
sumitshrivastav2904
 
dm_spdm_short.ppt
SahilShahPhD2020
 
dm_spdm_short (3).ppt
yakot2alordea2
 
Spatial data mining
MITS Gwalior
 
17 manjula aakunuri final_paper--185-190
Alexander Decker
 
Chapter7.Revised
Achmad Solichin
 
International Journal of Engineering Research and Development
IJERD Editor
 
Applying association rules and co location techniques on geospatial web services
Alexander Decker
 
Spatial databases
Seraphic Nazir
 
Spatial Data Mining
Rashmi Bhat
 
PREDICTION OF STORM DISASTER USING CLOUD MAP-REDUCE METHOD
AM Publications
 
APPLICATION OF SPATIOTEMPORAL ASSOCIATION RULES ON SOLAR DATA TO SUPPORT SPAC...
IJDKP
 
Application of Spatiotemporal Association Rules on Solar Data to Support Spac...
IJDKP
 
A hybrid approach for analysis of dynamic changes in spatial data
IJDMS
 
Graph Mining, Graph Patterns, Social Network, Set & List Valued Attribute, Sp...
Amar Myana
 
Performing Fast Spatial Query Search by Using Ultimate Code Words
BRNSSPublicationHubI
 
Spatial co location pattern mining
Seung Kwan Kim
 
4.2 spatial data mining
Krish_ver2
 
13584 27 multimedia mining
Universitas Bina Darma Palembang
 
08. Mining Type Of Complex Data
Achmad Solichin
 
unitiv-spacialdataanalysis-200423132043.pdf
sumitshrivastav2904
 
Ad

More from TELKOMNIKA JOURNAL (20)

PDF
Earthquake magnitude prediction based on radon cloud data near Grindulu fault...
TELKOMNIKA JOURNAL
 
PDF
Implementation of ICMP flood detection and mitigation system based on softwar...
TELKOMNIKA JOURNAL
 
PDF
Indonesian continuous speech recognition optimization with convolution bidir...
TELKOMNIKA JOURNAL
 
PDF
Recognition and understanding of construction safety signs by final year engi...
TELKOMNIKA JOURNAL
 
PDF
The use of dolomite to overcome grounding resistance in acidic swamp land
TELKOMNIKA JOURNAL
 
PDF
Clustering of swamp land types against soil resistivity and grounding resistance
TELKOMNIKA JOURNAL
 
PDF
Hybrid methodology for parameter algebraic identification in spatial/time dom...
TELKOMNIKA JOURNAL
 
PDF
Integration of image processing with 6-degrees-of-freedom robotic arm for adv...
TELKOMNIKA JOURNAL
 
PDF
Deep learning approaches for accurate wood species recognition
TELKOMNIKA JOURNAL
 
PDF
Neuromarketing case study: recognition of sweet and sour taste in beverage pr...
TELKOMNIKA JOURNAL
 
PDF
Reversible data hiding with selective bits difference expansion and modulus f...
TELKOMNIKA JOURNAL
 
PDF
Website-based: smart goat farm monitoring cages
TELKOMNIKA JOURNAL
 
PDF
Novel internet of things-spectroscopy methods for targeted water pollutants i...
TELKOMNIKA JOURNAL
 
PDF
XGBoost optimization using hybrid Bayesian optimization and nested cross vali...
TELKOMNIKA JOURNAL
 
PDF
Convolutional neural network-based real-time drowsy driver detection for acci...
TELKOMNIKA JOURNAL
 
PDF
Addressing overfitting in comparative study for deep learningbased classifica...
TELKOMNIKA JOURNAL
 
PDF
Integrating artificial intelligence into accounting systems: a qualitative st...
TELKOMNIKA JOURNAL
 
PDF
Leveraging technology to improve tuberculosis patient adherence: a comprehens...
TELKOMNIKA JOURNAL
 
PDF
Adulterated beef detection with redundant gas sensor using optimized convolut...
TELKOMNIKA JOURNAL
 
PDF
A 6G THz MIMO antenna with high gain and wide bandwidth for high-speed wirele...
TELKOMNIKA JOURNAL
 
Earthquake magnitude prediction based on radon cloud data near Grindulu fault...
TELKOMNIKA JOURNAL
 
Implementation of ICMP flood detection and mitigation system based on softwar...
TELKOMNIKA JOURNAL
 
Indonesian continuous speech recognition optimization with convolution bidir...
TELKOMNIKA JOURNAL
 
Recognition and understanding of construction safety signs by final year engi...
TELKOMNIKA JOURNAL
 
The use of dolomite to overcome grounding resistance in acidic swamp land
TELKOMNIKA JOURNAL
 
Clustering of swamp land types against soil resistivity and grounding resistance
TELKOMNIKA JOURNAL
 
Hybrid methodology for parameter algebraic identification in spatial/time dom...
TELKOMNIKA JOURNAL
 
Integration of image processing with 6-degrees-of-freedom robotic arm for adv...
TELKOMNIKA JOURNAL
 
Deep learning approaches for accurate wood species recognition
TELKOMNIKA JOURNAL
 
Neuromarketing case study: recognition of sweet and sour taste in beverage pr...
TELKOMNIKA JOURNAL
 
Reversible data hiding with selective bits difference expansion and modulus f...
TELKOMNIKA JOURNAL
 
Website-based: smart goat farm monitoring cages
TELKOMNIKA JOURNAL
 
Novel internet of things-spectroscopy methods for targeted water pollutants i...
TELKOMNIKA JOURNAL
 
XGBoost optimization using hybrid Bayesian optimization and nested cross vali...
TELKOMNIKA JOURNAL
 
Convolutional neural network-based real-time drowsy driver detection for acci...
TELKOMNIKA JOURNAL
 
Addressing overfitting in comparative study for deep learningbased classifica...
TELKOMNIKA JOURNAL
 
Integrating artificial intelligence into accounting systems: a qualitative st...
TELKOMNIKA JOURNAL
 
Leveraging technology to improve tuberculosis patient adherence: a comprehens...
TELKOMNIKA JOURNAL
 
Adulterated beef detection with redundant gas sensor using optimized convolut...
TELKOMNIKA JOURNAL
 
A 6G THz MIMO antenna with high gain and wide bandwidth for high-speed wirele...
TELKOMNIKA JOURNAL
 
Ad

Recently uploaded (20)

PDF
MAD Unit - 1 Introduction of Android IT Department
JappanMavani
 
PPTX
Product Development & DevelopmentLecture02.pptx
zeeshanwazir2
 
PPTX
GitOps_Repo_Structure for begeinner(Scaffolindg)
DanialHabibi2
 
PPTX
Break Statement in Programming with 6 Real Examples
manojpoojary2004
 
PPTX
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
PPTX
原版一样(Acadia毕业证书)加拿大阿卡迪亚大学毕业证办理方法
Taqyea
 
PPTX
Introduction to Design of Machine Elements
PradeepKumarS27
 
PDF
monopile foundation seminar topic for civil engineering students
Ahina5
 
PPTX
Arduino Based Gas Leakage Detector Project
CircuitDigest
 
PPTX
Green Building & Energy Conservation ppt
Sagar Sarangi
 
PDF
Unified_Cloud_Comm_Presentation anil singh ppt
anilsingh298751
 
PDF
Design Thinking basics for Engineers.pdf
CMR University
 
PDF
6th International Conference on Machine Learning Techniques and Data Science ...
ijistjournal
 
DOCX
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
PPTX
Introduction to Neural Networks and Perceptron Learning Algorithm.pptx
Kayalvizhi A
 
PPTX
GitOps_Without_K8s_Training_detailed git repository
DanialHabibi2
 
PPTX
Day2 B2 Best.pptx
helenjenefa1
 
PPTX
Server Side Web Development Unit 1 of Nodejs.pptx
sneha852132
 
DOC
MRRS Strength and Durability of Concrete
CivilMythili
 
PPTX
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 
MAD Unit - 1 Introduction of Android IT Department
JappanMavani
 
Product Development & DevelopmentLecture02.pptx
zeeshanwazir2
 
GitOps_Repo_Structure for begeinner(Scaffolindg)
DanialHabibi2
 
Break Statement in Programming with 6 Real Examples
manojpoojary2004
 
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
原版一样(Acadia毕业证书)加拿大阿卡迪亚大学毕业证办理方法
Taqyea
 
Introduction to Design of Machine Elements
PradeepKumarS27
 
monopile foundation seminar topic for civil engineering students
Ahina5
 
Arduino Based Gas Leakage Detector Project
CircuitDigest
 
Green Building & Energy Conservation ppt
Sagar Sarangi
 
Unified_Cloud_Comm_Presentation anil singh ppt
anilsingh298751
 
Design Thinking basics for Engineers.pdf
CMR University
 
6th International Conference on Machine Learning Techniques and Data Science ...
ijistjournal
 
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
Introduction to Neural Networks and Perceptron Learning Algorithm.pptx
Kayalvizhi A
 
GitOps_Without_K8s_Training_detailed git repository
DanialHabibi2
 
Day2 B2 Best.pptx
helenjenefa1
 
Server Side Web Development Unit 1 of Nodejs.pptx
sneha852132
 
MRRS Strength and Durability of Concrete
CivilMythili
 
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 

Spatial association discovery process using frequent subgraph mining

  • 1. TELKOMNIKA Telecommunication, Computing, Electronics and Control Vol. 18, No. 4, August 2020, pp. 1884∼1891 ISSN: 1693-6930, accredited First Grade by Kemenristekdikti, No: 21/E/KPT/2018 DOI: 10.12928/TELKOMNIKA.v18i4.13858 Ì 1884 Spatial association discovery process using frequent subgraph mining Giovanni Dai´an Rottoli1 , Hern´an Merlino2 1 Universidad Nacional de La Plata, Argentina 1 Universidad Tecnologica Nacional, Argentina 1,2 Information Systems Research Group, National University of Lan´us, Buenos Aires Article Info Article history: Received Aug 10, 2019 Revised Mar 10, 2020 Accepted Apr 3, 2020 Keywords: Frequent subgraph mining SARM Spatial association mining Spatial data mining Spatial knowledge discovery ABSTRACT Spatial associations are one of the most relevant kinds of patterns used by business intelligence regarding spatial data. Due to the characteristics of this particular type of information, different approaches have been proposed for spatial association mining. This wide variety of methods has entailed the need for a process to integrate the ac- tivities for association discovery, one that is easy to implement and flexible enough to be adapted to any particular situation, particularly for small and medium-size projects to guide the useful pattern discovery process. Thus, this work proposes an adaptable knowledge discovery process that uses graph theory to model different spatial rela- tionships from multiple scenarios, and frequent subgraph mining to discover spatial associations. A proof of concept is presented using real data. This is an open access article under the CC BY-SA license. Corresponding Author: Giovanni Dai´an Rottoli, Departamento de Ingenier´ıa en Sistemas de Informaci´on, Universidad Tecnol´ogica Nacional, F.R. Concepci´on del Uruguay, 676 Ing. Pereira Street, Concepci´ıon del Uruguay (3260), Entre R´ıos, Argentina. Email: [email protected] 1. INTRODUCTION Spatial knowledge discovery aims to find useful and novel patterns in spatial datasets to support decision-making in a particular problem domain [1]. Among all the possible patterns to discover, spatial asso- ciations are one of the most commonly used today in multiple fields such as climatology, geography, geology, criminology and ecology, among many others. They are comprised of predicates that involve spatial objects along with spatial and non-spatial relationships between those objects [2]. There are many challenges associ- ated with the characteristics of spatial data that make this data mining task more complicated, such as the spatial dependency data attributes, the multiplicity of spatial data representation models, the spatial relations between data objects and some particular spatial properties such as spatial autocorrelation and spatial heterogeneity [3]. Multiple algorithms have been developed for association pattern mining that can be used. Each of these algorithms, in general, aims to solve particular concerns about the aforemetioned challenges. The se- lection of a proper algorithm has become an arduous activity due to the growing number of new alternatives and their variants, specially to inexperienced users. Thus, it is necessary to provide a new process for small or medium-size application domains, one that is easy to implement and flexible enough to be adapted to multiple contexts. Consequently, this paper proposes a new process for association mining discovery from spatial data Journal homepage: http://journal.uad.ac.id/index.php/TELKOMNIKA
  • 2. TELKOMNIKA Telecommun Comput El Control Ì 1885 that utilizes graph theory to model spatial objects and the relations between them and frequent subgraph mining to find the substructures with a high repetition rate inside the general graph. These substructures correspond to association patterns. The proposal is a new alternative to model complex situations from a particular problem domain, but not replace or improve results from the algorithms in the state-of-the-art, however it provides a road map to initially address a problem. The rest of this paper is arranged as follows: section 2. on the charac- teristics of spatial data; section 3. contains association patterns and their characteristics regarding spatial data; section 4. includes the proposed process for discovery of spatial associations; a proof of concept using real world data is shown in section 5. Lastly, section 6. contains conclusions and future works . 2. SPATIAL DATA Spatial data is a particular type of dependent data. Formally, a spatial database D is a set of spatial records D = {T1, T2, · · · , Td} with Ti = {S1 i , S2 i , · · · , Sm i , X1 i , X2 i , · · · , Xn i }, where each Sk i is a spatial attribute that stores values about the spatial contexts, and each Xl i is a non-spatial attributes with values mea- sured at particular locations [3, 4]. The non-spatial attributes may be numerical or categorical according to the problem domain and the spatial attributes may be specified as coordinates or places (e.g. city name or state code). Additionally, there are three basic types of spatial objects: points, used to model specific punctual locations in the space; lines, used to model linear extensions such as rivers or roads; and polygons, used to represent objects that have a two-dimensional extension in the space, such as regions or states. The dependence of non-spatial attributes on spatial ones means that different implicit spatial relations can be extracted from data. Let D be a spatial database, a relation R ⊆ D2 is called spatial if and only if it is defined through a binary predicate P(x, y)|x, y ∈ D that involves the spatial attributes from the spatial objects x and y. For example, the spatial relation N ⊆ D2 , with x, y ∈ D, defined by the predicate shown in (2), is the neighborhood relations between two spatial points using euclidean distance: xNy ⇐⇒ Dist(x, y) < λ , λ ∈ R+ These relations can be classified as geometric, if they are related to the principles of euclidean geom- etry (e.g. neighbouring relationships); directional, when they refer to relative spatial orientations (e.g. above, below, north, east); topological, if they are independent from the concepts of distance and direction and are not affected by spatial transformations such as rotation or translation (e.g. intersect, inside), or hybrid, if they are related to two or more of the aforementioned types of properties. These relationships can be calculated using different methods depending on the problem domain and the class of spatial data used: points, lines or polygons [5, 6]. On the other hand, two properties are derived from spatial dependence: spatial autocorrelation, i.e., observations of spatially distributed random variables are not location-independent, and spatial heterogeneity, i.e., patterns found in some region of the space may not have the same support in other region. Spatial auto- correlation refers to the particularity of spatial data to not be distributed independently throughout the space. The distribution depends on the characteristics of the data points, the characteristics of the underlying space or the spatial neighboring relationships. For example, churches tends to be located near public squares or animal tends to travel to locations that contain their food sources [7]. Spatial heterogeneity is related to spatial auto- correlation. This phenomenon describes the local nature of spatial patterns, which are subordinated to some specific locations. Thus, a spatial pattern, such as association rules, may have a high support value in a region and a low support value in a different one. This phenomenon is also known as Simpson’s paradox [8]. All these particular characteristics make knowledge extraction from spatial data become a complex activity which not only has to consider patterns between data records, but also the implicit relationships between spatial objects. 3. SPATIAL ASSOCIATIONS One of the most common patterns to find in data is the association pattern. An association pattern P is defined as an n-ary predicate P = (p1, p2, · · · , pn) with a high probability of occurrence in the dataset. Its classic application is the supermarket basket analysis to discover whether or not there is some correlation between items that are bought together. An association pattern is referred to as spatial if at least one of its atomic predicates pk involves a spatial relationship between its variables [2]. For example, in a city C, churches and public squares tend to be neighbors: City(C) ∧ Church(X) ∧ PublicSquare(Y ) ∧Inside(X, C) ∧ Inside(Y, C) ∧ Neighbors(X, Y ) As shown in the previous example, Inside(X, C), Inside(Y, C) and Neighbors(X, Y ) are spatial Spatial association discovery process using frequent subgraph mining (Giovanni Dai´an Rottoli)
  • 3. 1886 Ì ISSN: 1693-6930 predicates related to topological and geometric relationships. Many different relations must be taken into consideration at the same time to find useful spatial associations. Also, these relations must be calculated in local contexts, due to the aforementioned Simpson’s Paradox. Multiple efforts have been made in order to find spatial association patterns in spatial databases: [7] proposes a method for spatial association mining that consider spatial autocorrelation by using a cell structure; [9] focuses on the problem of rule extraction from spatial data with crisp condition attributes and fuzzy deci- sions. A rough-fuzzy set based rule extraction model is used to deal with both fuzziness and roughness; [10] combines and extend techniques developed in both spatial and fuzzy data mining to deal with the uncertainty found in typical spatial data. This proposal uses fuzzy logic to get relevant information from transition areas between spatial neighborhoods to spatial association mining and for spatial relationships modelling; [11, 12] propose an algorithm for local patterns discovery considering spatial heterogeneity that incorporates a novel spatial metric for support evaluation based on event density in a particular area; [13] presents a specially de- signed algorithm to discover spatial associations related to El Ni˜no Southern Oscillation (ENSO); [14] applies an algorithm that explores multiple spatial objects hierarchies; [15] uses A-Priori-based approaches to find spatial association rules; [6, 16] propose using Inductive Logic Programming (ILP) for reach this data mining purpose by modelling and stracting high support spatial relations from spatial data. [17] worked with meta- heuristics such as genetic algorithms and evolutionary programming; [18] suggested a data-transformation approach before using traditional association rule mining algorithms; [19] introduced non-trivial structures such as graphs for spatial relationship representation; among others. Because of this variety of spatial data mining approaches for association discovery, it is difficult to select a proper algorithm or method to be used in small knowledge discovery application contexts. Because of this, a unified and general process is required to deal with the aforementioned problems and it has to be flexible enough to be adapted to multiple particular situations and easy to implement. 4. SPATIAL ASSOCIATION DISCOVERY PROCESS This work describes a new process for spatial association extraction considering the possibility of having multiple relationships between spatial objects of any kind (i.e. points, lines, polygons), and considering the spatial autocorrelation and spatial heterogeneity. This process is designed as a first approach to get spatial association knowledge from data in particular contexts easy to implement in small or medium-size projects. The process Figure 1 is divided into 5 main steps: data preparation (section 4.1.), neighborhood definition (section 4.2.), modelling of spatial relationships using graphs (section 4.3.), frequent subgraph mining (section 4.4.) and evaluation of results (section 4.5.). Figure 1. Spatial association discovery process 4.1. Data preparation The proposed process starts with a spatial data preparation step. It is necessary to codify the various spatial datasets obtained from different sources in different formats, in order to enable the extraction of relations between all the data instances in later steps. In general terms, it is not uncommon to have multiple spatial objects layers, each of them with a particular representation type and related to a particular scenario from the problem domain. On the other hand, two types of datasets must be considered: target datasets, with objects directly TELKOMNIKA Telecommun Comput El Control, Vol. 18, No. 4, August 2020 : 1884 – 1891
  • 4. TELKOMNIKA Telecommun Comput El Control Ì 1887 related to the problem domain that are going to be present in every association pattern, and relevant datasets, that may or may not be related to the target datasets, but add important information that may be useful for decision making [20]. These data must be prepared by cleaning errors, solving inconsistent and null values, and dealing with outliers. New attributes or even new data objects could be generated using the input data. This step requires considerable effort and may require many iterations. Thus, it is advisable to implement the process using a proper methodology such as CRISP-DM [21]. 4.2. Neighborhood definition As mentioned before, a particular spatial association pattern may have a higher occurrence probability in some regions and lower probability in others [8]. For this reason it is preferred to search for this kind of pattern locally. For this, we propose defining partitions of the dataset, called neighborhoods in this context, and the subsequent execution of the association pattern search algorithm on each of them. These neighborhoods can be defined beforehand using knowledge from to the problem domain, or using spatial clustering techniques. Using density-based or distance-based spatial clustering algorithms [22– 24] is suggested due to the First Law of Geography, which states that spatial objects located together are more closely related than those that are far away from each other [25, 26]. Nonetheless, there is an issue to consider in this step: the limits between neighborhoods may add important information for spatial association mining. Thus, the use of fuzzy clustering techniques or flexible boundaries models may be desirable. 4.3. Modelling of spatial relationships using graphs Now, we have to calculate the spatial relations between the target data instances and the instances of the relevant dataset from each neighborhood. Depending on the problem domain, different types of spatial relations can be calculated: euclidian, topological, directional or hybrid relationships, as mentioned above [6]. This might be a step with a high computational cost. Graph theory is proposed to model the spatial relationships due to its close relation with first order logic and the pattern to find [16]. Graphs are discrete structures consisting of vertices and edges that connect these vertices. There are different kinds of graphs, depending on whether edges have directions (digraphs), whether multiple edges can connect the same pair of vertices (multigraphs), and whether loops are allowed. Formally, a simple graph G = (V, E) consists of V, a nonempty set of vertices (or nodes) and E, a set of edges. Each edge has two vertices associated with it, called its endpoints. An edge is said to connect its endpoints. To relate each edge to its endpoints, a function φ : E → {v1 ∈ V, v2 ∈ V }, called incidence function, is used. A multigraph, on the other hand, is a graph where multiple edges can exist associated with the same endpoints. Additionally, each vertex and each edge can be labeled with data related to the represented object. This structure can be adapted to multiple scenarios and multiple efficient algorithms can be used to extract valuable information such as maximum cliques [27]. In the context of this work, multigraphs are used to model spatial objects as vertices and the relations between them as edges. A small example can be seen in Figure 2 (a). Two sets of labels and two extra functions to asign those labels to the vertices and edges are needed. So, let G be a multigraph without loops G = (V, E, L, K, φ,l ,k ) where: V is the vertex set of G, which corresponds to the spatial objects from the datasets; E is the edge set that corresponds to each calculated relationship between the spatial objects; L is the vertex label set with the characteristics of the spatial data objects; K is the edge label set, with the characteristics of each spatial relation; φ : E → {x ∈ P(V )/|x| ≤ 2} is the incidence function; l ⊆ V × L and k ⊆ E × K are labeling relations. The aforementioned structure makes it possible to model multiple different relationships with the same endpoints labeled with different attributes. Also, many attributes of spatial objects could be taken into consideration. Additionally, it must be noted that loops (i.e. edges with only one endpoint) are not considered because their lack of semantics in this context (there are not spatial relationships that involves only one spatial object). Fuzzy logic could also be a valuable tool to model the spatial relationships, if the situation requires it [10]. More information about fuzzy logic this can be found in [28] 4.4. Frequent subgraph mining To extract spatial associations with a high probability of occurrence, frequent subgraph mining is pro- posed to be used for each modeled graph. Given a multigraph G = (V, E, L, K, φ,l ,k ) like the one described in the previous section, the frequent subgraph mining problem in a single multigraph is finding recurring subgraph Spatial association discovery process using frequent subgraph mining (Giovanni Dai´an Rottoli)
  • 5. 1888 Ì ISSN: 1693-6930 Gi ⊂ G, or in other words, a subgraph that has multiple instances in the original graph Figure 2 (b). It must be noted that two graphs are isomorphic if all of their vertices and edges are shared including its labels.These frequent subgraphs represent the relationships between spatial object types that take place in the space with a high occurrence probability. Multiple algorithms have been designed for frequent subgraph mining in a single big graph, calcu- lating the relevance of a pattern in different ways. Some well-known examples of this are IncGM+, FSSG, SUBDUE, among others [29, 30]. A set of frequent subgraphs for each neighborhood is obtained as a result of this step and must be analyzed to obtain useful knowledge for decision-making. 4.5. Evaluation of results In the final step, frequent subgraphs translated into n-ary predicates that represent trivial information (non-novel patterns) must be filtered. The support and confidence measures can be extracted, selecting the metrics that the desicion-maker consider to be more appropiate. This activity could be performed automatically or manually by an analyst with knowledge about the problem domain with help from an expert. PublicSquareX1 CityA ChurchY1 Include Include Neighbors (a) (b) Figure 2. (a) Simplified example of spatial relationship modelling using a simple graph. (b) Example of frequent subgraph (bottom) found in a simple graph without labels in the edges. 5. PROOF OF CONCEPT The proof of concept presented in this section is intended to show how the proposed process works, implemented by different programming and data mining tools. The data used in this example consists of 10 data files containing the location of facilities in Buenos Aires (Argentina) and its surroundings. These facilities include libraries(74), clinics(63), post offices(55), sports halls(50), nightclubs(41), schools(107), gas stations (97), churches (125), museums(37) or police stations (93). For each of them, in the preparation step of the proposed process, the data files were integrated into a single data file of spatial points using QGis (http://qgis.org/). Each spatial point is comprised of two spatial attributes, Latitude and Longitude, and one non-spatial attribute, the type of building from the previous list. After that, only the points that are located outside Buenos Aires limits were filtered to reduce the search space, leaving 742 spatial points Figure 3 (a), (orange). Then, in the neighborhood definition step, the HDDBSCAN clustering algorithm [31] from the ’dbscan’ library from R programming language was used on the spatial data attributes to generate two neighborhoods with a minimum number of points equal to 50 in each of them Figure 3 (a), (blue). Only two neighborhoods were used because of explanatory purposes. In the next step, for each of the generated neighborhoods, a geometric relationship between their data points was extracted forming a graph with vertices labeled with the type of facility related to each data point and edges labeled with the sentence ”close to” if the adjacent points were less than 150 meters away from each other (this value was selected for illustrative purposes only). Thus, two graphs were created: one with 71 vertices and 45 edges in neighborhood 1, and another with 15 vertices and 11 edges in neighborhood 2. To obtain the frequent subgraphs of each of the generated graphs, SUBDUE algorithm was used via its implementation in Subdue Graph Miner Software, using the compression rate as support measure. TELKOMNIKA Telecommun Comput El Control, Vol. 18, No. 4, August 2020 : 1884 – 1891
  • 6. TELKOMNIKA Telecommun Comput El Control Ì 1889 The result was a subgraph as shown in Figure 3 (b), with a compression rate of 15.5% in neighborhood 1, which was translated into the predicate Post office(x1) ∧ Nightclub(x2) ∧ Close to(x1, x2) and two sub- graphs in neighborhood 2 , both with a compression rate of 27.2% that was translated into the predicates Clinic(x1) ∧ Post office(x2) ∧ Close to(x1, x2) Post office(x1) ∧ Sport hall(x2) ∧ Close to(x1, x2) (a) (b) Figure 3. (a) Spatial neighborhoods generated for the proof of concept using HDBSCAN algorithm; (b) Results of the proof of concept. 5.1. Discussion The contributions of the proposed process are, firstly, the possibility of adapting it to multiple scenar- ios, due to its flexible underlying structure being based on graphs. Some of the aforementioned methods use flexible structures too [6, 16] but the complexity of these methods increases because of the use of techniques based on Logic Programming. On the other hand, some other methods do not take into account complex pat- terns [19]. Furthermore, the possibility of including valuable information related to the data objects and the spatial relations by using labels in the graph representation is also considered. Generally, the data structures involved do not take into account complex data associated to the spatial relations between spatial data. In relation to the above, the proposed process considers spatial phenomena such as autocorrelation and heterogeneity, by using spatial neighborhoods. Some alternatives such as [7] considering spatial autocorrelation but not considering spatial heterogeneity or complex data relationships. In most of the cases studied, these characteristics are present due to their relevance in data mining. Also, related to this, the proposed process allows its implementation by using existing tools such as frequent subgraph mining algorithms and clustering algorithms. Some of the state-of-the-art alternatives include very flexible and powerful strategies, but implementation is hard, making them not suitable for appli- cation in small or medium size projects [6, 9, 16, 19]. Lastly, the high adaptability of the procedure is a desired characteristic due to the possibility of selecting among many algorithms for the implementation of each step. Usually, the state-of-the-art methods propose a single alternative for its execution. 6. CONCLUSION This work describes a knowledge discovery process called for extraction of spatial associations. The process is flexible enough to take into account multiple and varied spatial relationships between spatial objects of any kind, using a graph structure to model them. Heterogeneity and autocorrelation phenomena are also considered, defining neighborhoods where the search process is performed to find this class of regularity. The solution was designed to initially approach to this data mining task without worrying too much about par- ticular characteristics of data mining algorithms. In a large-scale project, this process could guide the selection of specific methods based on the results obtained in first iterations of an incremental methodology. A proof of concept is presented as well, using real data to illustrate how the process is implemented using different programming and data mining tools in each of the proposed steps. Spatial association discovery process using frequent subgraph mining (Giovanni Dai´an Rottoli)
  • 7. 1890 Ì ISSN: 1693-6930 In future works, the research will be focused on implementation strategies according to the problem domain for each of the steps of the process, in order to decrease computational execution time when dealing with large amounts of spatial objects and spatial relationships. Also, fuzzy methods will be considered for relation modelling and neighborhood definition. ACKNOWLEDGEMENT The research presented in this paper was partially funded by the PhD Scholarship Program to reinforce R&D&I areas (2016-2020) of the Universidad Tecnol´ogica Nacional and the Research Project 80020160400001 LA of National University of Lan´us. The authors also want to extend their gratitude to Kevin-Mark Bozell Poudereux, for proofreading the translation. REFERENCES [1] R. Garcia-Martinez, P. Britos, and D. Rodriguez, “Information mining processes based on intelligent sys- tems,” International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, pp. 402-410, 2013. [2] K. Koperski and J. Han, “Discovery of spatial association rules in geographic information databases,” International Symposium on Spatial Databases, pp. 47–66, 1995. [3] Y. Leung et al., ”Knowledge discovery in spatial data,” Springer, 2010. [4] C. C. Aggarwal, ”Data mining: The textbook,” Springer, 2015. [5] R. Agrawal, et al., “Fast algorithms for mining association rules,” Proc. 20th int. conf. very large data bases, vol. 1215, pp. 487-499, 1994. [6] A. Appice, M. Ceci, A. Lanza, F. A. Lisi, and D. Malerba, “Discovery of spatial association rules in geo-referenced census data: A relational mining approach,” Intelligent Data Analysis, vol. 7, no. 6, pp. 541-566, 2003. [7] J. Chen, “An algorithm about association rule mining based on spatial autocorrelation,” The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. 37, no. B6b, pp. 99-106, 2008. [8] E. H. Simpson, “The interpretation of interaction in contingency tables,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 13, no. 2, pp. 238–241, 1951. [9] H. Bai, Y. Ge, J. Wang, D. Li, Y. Liao, and X. Zheng, “A method for extracting rules from spatial data based on rough fuzzy sets,” Knowledge-Based Systems, vol. 57, pp. 28-40, 2014. [10] R. Ladner, F. E. Petry, and M. A. Cobb, “Fuzzy set approaches to spatial data mining of association rules,” Transactions in GIS, vol. 7, no. 1, pp. 123-138, 2003. [11] Z. Sha and X. Li, “Mining local association patterns from spatial dataset,” Seventh International Confer- ence on Fuzzy Systems and Knowledge Discovery (FSKD), vol. 3, pp. 1455-1460, 2010. [12] Z. Sha, X. Tan, and Y. Bai, “Localized spatial association: A case study for understanding vegetation successions in a typical grassland ecosystem,” Geo-Informatics in Resource Management and Sustainable Ecosystem, pp. 33-45, 2015. [13] X. Cunjin and L. Xiaohan, “Novel algorithm for mining ENSO-oriented marine spatial association pat- terns from raster-formatted datasets,” ISPRS International Journal of Geo-Information, vol. 6, no. 5, pp. 1-15, 2017. [14] A.Salleband and C.Vrain, “Anapplication of association rules discovery to geographic information sys- tems,” European Conference on Principles of Data, pp. 613-618, 2000. [15] S. S. U. Sutjipto, I. S. Sitanggang, and B. Barus, “Potential usage estimation of ground water using spatial association rule mining,” TELKOMNIKA Telecommunication, Computing, Electronics and Control, vol. 15, no. 1, pp. 504-511, 2017. [16] D. Malerba, F. Esposito, F. A. Lisi, and A. Appice, “Mining spatial association rules in census data,” Research in Official Statistics, vol. 5 no. 1, pp. 19-44, 2003. [17] A. H. Goudarzi and N. Ghadiri, “A hybrid spatial data mining approach based on fuzzy topological rela- tions and moses evolutionary algorithm,” Artificial Intelligence, Cornell University 2017. [18] I. Lee, “Mining multivariate associations within gis environments,” International Conference on Indus- trial, Engineering and Other Applications of Applied Intelligent Systems, pp. 1062-1071, 2004. TELKOMNIKA Telecommun Comput El Control, Vol. 18, No. 4, August 2020 : 1884 – 1891
  • 8. TELKOMNIKA Telecommun Comput El Control Ì 1891 [19] H. Yang, S. Parthasarathy, and S. Mehta, “Mining spatial object associations for scientific data,” II Inter- national Joint Conference on Artificial Intelligence (IJCAI), pp. 902-907, 2005. [20] V. Bogorny, P. M. Engel, and L. O. Alvares, “Geoarm: an interoperable framework to improve geographic data preprocessing and spatial association rule mining.” SEKE, pp. 79-84, 2006. [21] R. Wirth and J. Hipp, “Crisp-dm: Towards a standard process model for data mining,” Proceedings of the 4th international conference on the practical applications of knowledge discovery and data mining, pp. 29-39, 2000. [22] J. Sander,et al., “Density-based clustering in spatial databases: The algorithm gdbscan and its applica- tions,” Data mining and knowledge discovery, vol. 2, no. 2, pp. 169-194, 1998. [23] Y. Zhu, K. M. Ting, and M. J. Carman, “Density-ratio based clustering for discovering clusters with varying densities,” Pattern Recognition, vol. 60, pp. 983-997, 2016. [24] A. Sharma, R. Gupta, and A. Tiwari, “Improved density based spatial clustering of applications of noise clustering algorithm for knowledge discovery in spatial data,” Mathematical Problems in Engineering, vol. 2016, 2016. [25] W. R. Tobler, “Cellular geography,” Philosophy in geography, pp. 379-386, 1979. [26] J. Duan, L. Wang, and X. Hu, “The effect of spatial autocorrelation on spatial co-location pattern mining,” International Conference on Computer, Information and Telecommunication Systems, pp. 210-214, 2017. [27] G. D. Rottoli, H. Merlino, and R. Garc ıa-Martinez, “Co-location rules discovery process focused on ref- erence spatial features using decision tree learning,” International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, pp. 221-226, 2017. [28] D. J. Dubois, ”Fuzzy sets and systems: theory and applications,” Academic press, vol. 144, 1980. [29] E. Abdelhamid, M. Canim, M. Sadoghi, B. Bhattacharjee, Y. Chang, and P. Kalnis, “Incremental frequent subgraph mining on large evolving graphs,” IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 12, pp. 2710-2723, 2017. [30] D. Kavitha, V. Kamakshi, and J. Murthy, “Finding frequent subgraphs in a single graph based on symme- try,” International Journal of Computer Applications, vol, vol. 146, no. 11, pp. 0975-8887 2016. [31] R. J. Campello, D. Moulavi, A. Zimek, and J. Sander, “Hierarchical density estimates for data clustering, visualization, and outlier detection,” ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 10, no. 1, pp. 5, 2015. BIOGRAPHIES OF AUTHORS Giovanni Dai´an Rottoli is a researcher at the Computational Intelligence and Software Engineering Research Group (GIICIS) from the National University of Technology (Argentina). He holds a Bach- elor´s Degree in Information Systems from the aforementioned university (2015). He is currently a Ph.D. candidate in Computer Science at the National University of La Plata (Argentina). He works as an associate professor of Discrete Mathematics and Data Science at the National University of Technology (Argentina).His research is focused in the fields of spatial data mining and knowledge discovery, artificial intelligence and search-based software engineering. Further info can be found on his profile: http://www.frcu.utn.edu.ar/giicis/rottolig/ Hern´an Merlino is the head of the Advanced Information Systems Laboratory at Buenos Aires Uni- versity (Argentina) and the head of the Artificial Intelligence Laboratory at the National University of Lan´us (Argentina). He is a fellow of the Gas and Petroleum Institute at Buenos Aires University. He holds a Bachelor’s Degree in Information Systems from the University of Belgrano (Argentina), a Master’s Degree in Software Engineering from the Computer Science Department of the Polytechnic University of Madrid (Spain), and a Ph.D. in Information Sciences from the National University of La Plata (Argentina). He works as a tenured full professor in graduate and postgraduate courses at Buenos Aires University, Austral University and National University of Lan´us. His research inter- ests are: artificial intelligence, data mining, and blockchain technologies. In the professional field, he works as a Scientific Research Director in an Artificial Intelligence, Data Science, Blockchain & Smart Contracts company in Argentina. Spatial association discovery process using frequent subgraph mining (Giovanni Dai´an Rottoli)