Summary
In this chapter, you have learned how to process unstructured information and how to represent such information by means of a graph. Starting from a well-known benchmark dataset, Reuters-21578, we applied standard NLP engines to tag and structure textual information. These high-level features were then used to create different types of networks: knowledge base networks, bipartite networks, projections of bipartite networks onto each subset of node types, and a topic-topic similarity network. The different graphs also allowed us to use the tools we have presented in previous chapters to extract insights from the network representation.
We used local and global properties in order to show you how these quantities can represent and describe structurally different types of networks. Unsupervised techniques were then used in order to identify semantic communities and cluster together documents belonging to similar subjects/topics. Finally, we used the labeled information provided...