Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Graph Machine Learning

You're reading from   Graph Machine Learning Learn about the latest advancements in graph data to build robust machine learning models

Arrow left icon
Product type Paperback
Published in Jul 2025
Publisher Packt
ISBN-13 9781803248066
Length 434 pages
Edition 2nd Edition
Languages
Tools
Arrow right icon
Authors (3):
Arrow left icon
Aldo Marzullo Aldo Marzullo
Author Profile Icon Aldo Marzullo
Aldo Marzullo
Enrico Deusebio Enrico Deusebio
Author Profile Icon Enrico Deusebio
Enrico Deusebio
Claudio Stamile Claudio Stamile
Author Profile Icon Claudio Stamile
Claudio Stamile
Arrow right icon
View More author details
Toc

Table of Contents (20) Chapters Close

Preface 1. Part 1: Introduction to Graph Machine Learning
2. Getting Started with Graphs FREE CHAPTER 3. Graph Machine Learning 4. Neural Networks and Graphs 5. Part 2: Machine Learning on Graphs
6. Unsupervised Graph Learning 7. Supervised Graph Learning 8. Solving Common Graph-Based Machine Learning Problems 9. Part 3: Practical Applications of Graph Machine Learning
10. Social Network Graphs 11. Text Analytics and Natural Language Processing Using Graphs 12. Graph Analysis for Credit Card Transactions 13. Building a Data-Driven Graph-Powered Application 14. Part 4: Advanced topics in Graph Machine Learning
15. Temporal Graph Machine Learning 16. GraphML and LLMs 17. Novel Trends on Graphs 18. Index
19. Other Books You May Enjoy

Dealing with large graphs

When approaching a use case or an analysis, it is very important to understand how large the data we focus on is or will be in the future, as the dimension of the datasets may very well impact both the technologies we use and the analysis that we can do. As already mentioned, some of the approaches that have been developed on small datasets hardly scale to real-world applications and larger datasets, making them useless in practice.

When dealing with (possibly) large graphs, it is crucial to understand potential bottlenecks and limitations of the tools, technologies, and/or algorithms we use, assessing which part of our application/analysis may not scale when increasing the number of nodes or edges. Even more importantly, it is crucial to structure a data-driven application, however simple or at what early stage of the proof of concept (POC), in a way that would allow its scaling out in the future when data/users would increase, without rewriting the whole application.

Creating a data-driven application that resorts to graphical representation/modeling is a challenging task that requires a design and implementation that is a lot more complicated than simply importing NetworkX. In particular, it is often useful to decouple the component that processes the graph—the graph processing engine—from the one that allows querying and traversing the graph—the graph storage layer. We will further discuss these concepts in Chapter 10, Building a Data-Driven Graph-Powered Application. Nevertheless, given the focus of the book on ML and analytical techniques, it makes sense to focus more on graph processing engines than on graph storage layers. We, therefore, find it useful at this stage to provide you with some of the technologies that are used for graph processing engines to deal with large graphs, crucial when scaling out an application.

In this respect, it is important to classify graph processing engines into two categories (that impact the tools/libraries/algorithms to be used), depending on whether the graph can fit a shared memory machine or requires distributed architectures to be processed and analyzed.

Note that there is no absolute definition of large and small graphs, but it also depends on the chosen architecture. Nowadays, thanks to the vertical scaling of infrastructures, you can find servers with random-access memory (RAM) larger than 1 terabyte (TB) (usually called fat nodes), and with tens of thousands of central processing units (CPUs) for multithreading in most cloud-provider offerings, although these infrastructures might not be economically viable. Even without scaling out to such extreme architectures, graphs with millions of nodes and tens of millions of edges can nevertheless be easily handled in single servers with ~100 gigabytes (GB) of RAM and ~50 CPUs.

Although networkx is a very popular, user-friendly, and intuitive library, when scaling out to such reasonably large graphs, it may not be the best available choice. NetworkX, being natively written in pure Python, which is an interpreted language, can be substantially outperformed by other graph engines fully or partly written in more performant programming languages (such as C++ and Julia) and that make use of multithreading, such as the following:

All the preceding libraries are valid alternatives to NetworkX when achieving better performance becomes an issue. Improvements can be very substantial, with speed-ups varying from 30 to 300 times faster, with the best performance generally achieved by LightGraphs.

In the forthcoming chapters, we will mostly focus on NetworkX in order to provide a consistent presentation and provide you with basic concepts on network analysis. We want you to be aware that other options are available, as this becomes extremely relevant when pushing the edge from a performance standpoint.

You have been reading a chapter from
Graph Machine Learning - Second Edition
Published in: Jul 2025
Publisher: Packt
ISBN-13: 9781803248066
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime