SlideShare a Scribd company logo
Exploring London NoSQL meetups
using R
Mark Needham
@markhneedham
Meetup Analytics with R and Neo4j
Meetup Analytics with R and Neo4j
Scraper at the ready...
Not needed :(
Lots of bits of data
● Events
● Members
● Groups
● RSVPs
● Venues
● Topics
The data model
Interesting questions to ask...
Interesting questions to ask...
● What day of the week do people go to meetups?
● Where abouts in London are NoSQL meetups held?
● Do people sign up for multiple meetups on the same
day?
● Are there common members between groups?
● What topics are people most interested in?
● In which order do people join the NoSQL groups?
● Who are the most connected people on the NoSQL
scene?
The tool set
RNeo4j
Results as a data frame
Query
dplyr
ggplot2
igraph ggmap
cluster
geosphere
When do people go to meetups?
When do people go to meetups?
(g:Group)-[:HOSTED_EVENT]->(event)<-[:TO]-
({response: 'yes'})<-[:RSVPD]-()
When do people go to meetups?
MATCH (g:Group)-[:HOSTED_EVENT]->(event)<-[:TO]-
({response: 'yes'})<-[:RSVPD]-()
WHERE (event.time + event.utc_offset) < timestamp()
RETURN g.name,
event.time + event.utc_offset AS eventTime,
event.announced_at AS announcedAt,
event.name,
COUNT(*) AS rsvps
R Neo4j
install.packages("devtools")
devtools::install_github("nicolewhite/Rneo4j")
library(RNeo4j)
graph = startGraph("http://localhost:7474/db/data/")
query = "MATCH … RETURN …"
cypher(graph, query)
Grouping events by month
library(dplyr)
events %>%
group_by(month) %>%
summarise(events = n(),
count = sum(rsvps),
max = max(rsvps)) %>%
mutate(ave = count / events) %>%
arrange(desc(ave))
Grouping events by month
## month events count ave
## 1 November 55 3018 54.87273
## 2 May 52 2676 51.46154
## 3 April 58 2964 51.10345
## 4 June 47 2384 50.72340
## 5 October 71 3566 50.22535
## 6 September 59 2860 48.47458
## 7 February 43 2047 47.60465
## 8 January 34 1592 46.82353
## 9 December 24 1056 44.00000
## 10 March 39 1667 42.74359
## 11 July 48 1866 38.87500
## 12 August 34 1023 30.08824
Grouping events by day
events %>%
group_by(day) %>%
summarise(events = n(),
count = sum(rsvps),
max = max(rsvps)) %>%
mutate(ave = count / events) %>%
arrange(day)
Grouping events by day
## day events count ave
## 1 Monday 63 4034 64.03175
## 2 Tuesday 151 6696 44.34437
## 3 Wednesday 225 9481 42.13778
## 4 Thursday 104 5394 51.86538
## 5 Friday 11 378 34.36364
## 6 Saturday 10 736 73.60000
Some simple bar charts
library(ggplot2)
g1 = ggplot(aes(x = day, y = ave), data = byDay) +
geom_bar(stat="identity", fill="dark blue") +
ggtitle("Average attendees by day")
g2 = ggplot(aes(x = day, y = count), data = byDay) +
geom_bar(stat="identity", fill="dark blue") +
ggtitle("Total attendees by day")
grid.arrange(g1,g2, ncol = 1)
London hits
the pub
Meetup Analytics with R and Neo4j
Where do people go to meetups?
(g:Group)-[:HOSTED_EVENT]->(event)<-[:TO]-
({response: 'yes'})<-[:RSVPD]-(),
(event)-[:HELD_AT]->(venue)
Where do people go to meetups?
MATCH (g:Group)-[:HOSTED_EVENT]->(event)<-[:TO]-
({response: 'yes'})<-[:RSVPD]-(), (event)-[:HELD_AT]->(venue)
WHERE (event.time + event.utc_offset) < timestamp()
RETURN g.name,
event.time + event.utc_offset AS eventTime,
event.announced_at AS announcedAt,
event.name,
venue.name AS venue,
venue.lat AS lat,
venue.lon AS lon,
COUNT(*) AS rsvps
Where do people go to meetups?
MATCH (g:Group)-[:HOSTED_EVENT]->(event)<-[:TO]-
({response: 'yes'})<-[:RSVPD]-(), (event)-[:HELD_AT]->(venue)
WHERE (event.time + event.utc_offset) < timestamp()
RETURN g.name,
event.time + event.utc_offset AS eventTime,
event.announced_at AS announcedAt,
event.name,
venue.name AS venue,
venue.lat AS lat,
venue.lon AS lon,
COUNT(*) AS rsvps
Where do people go to meetups?
byVenue = events %>%
count(lat, lon, venue) %>%
ungroup() %>%
arrange(desc(n)) %>%
rename(count = n)
Where do people go to meetups?
## lat lon venue count
## 1 51.50256 -0.019379 Skyline Bar at CCT Venues Plus 1
## 2 51.53373 -0.122340 The Guardian 1
## 3 51.51289 -0.067163 Erlang Solutions 3
## 4 51.49146 -0.219424 Novotel - W6 8DR 1
## 5 51.49311 -0.146531 Google HQ 1
## 6 51.52655 -0.084219 Look Mum No Hands! 22
## 7 51.51976 -0.097270 Vibrant Media, 3rd Floor 1
## 8 51.52303 -0.085178 Mind Candy HQ 2
## 9 51.51786 -0.109260 ThoughtWorks UK Office 2
## 10 51.51575 -0.097978 BT Centre 1
Where do people go to meetups?
library(ggmap)
map = get_map(location = 'London', zoom = 12)
ggmap(map) +
geom_point(aes(x = lon, y = lat, size = count),
data = byVenue,
col = "red",
alpha = 0.8)
Meetup Analytics with R and Neo4j
library(geosphere)
library(cluster)
clusteramounts = 40
distance.matrix = byVenue %>% select(lon, lat) %>% distm
clustersx <- as.hclust(agnes(distance.matrix, diss = T))
byVenue$group <- cutree(clustersx, k=clusteramounts)
byVenueClustered = byVenue %>%
group_by(group) %>%
summarise(meanLat = mean(lat),
meanLon = mean(lon),
total = sum(count),
venues = paste(venue, collapse = ","))
Spatial clustering
## group meanLat meanLon total
## 1 3 51.52349 -0.08506461 123
## 2 1 51.52443 -0.09919280 89
## 3 2 51.50547 -0.10325925 62
## 4 4 51.50794 -0.12714600 55
## 5 8 51.51671 -0.10028908 19
## 6 6 51.53655 -0.13798514 18
## 7 7 51.52159 -0.10934720 18
## 8 5 51.51155 -0.07004417 13
## 9 12 51.51459 -0.12314650 13
## 10 14 51.52129 -0.07588867 10
Spatial clustering
ggmap(map) +
geom_point(aes(x = meanLon, y = meanLat, size = total),
data = byVenueClustered,
col = "red",
alpha = 0.8)
Spatial clustering
Meetup Analytics with R and Neo4j
byVenue %>%
filter(group == byVenueClustered$group[1])
What’s going on in Shoreditch?
Meetup Group Member Overlap
● Why would we want to know this?
○ Perhaps for joint meetups
○ Topics for future meetups
Extracting the data
MATCH (group1:Group), (group2:Group)
WHERE group1 <> group2
OPTIONAL MATCH p =
(group1)<-[:MEMBER_OF]-()-[:MEMBER_OF]->(group2)
WITH group1, group2, COLLECT(p) AS paths
RETURN group1.name, group2.name,
LENGTH(paths) as commonMembers
ORDER BY group1.name, group2.name
Meetup Analytics with R and Neo4j
MATCH (group1:Group), (group2:Group)
WHERE group1 <> group2
OPTIONAL MATCH (group1)<-[:MEMBER_OF]-(member)
WITH group1, group2, COLLECT(member) AS group1Members
WITH group1, group2, group1Members,
LENGTH(group1Members) AS numberOfGroup1Members
UNWIND group1Members AS member
OPTIONAL MATCH path = (member)-[:MEMBER_OF]->(group2)
WITH group1, group2, COLLECT(path) AS paths, numberOfGroup1Members
WITH group1, group2, LENGTH(paths) as commonMembers, numberOfGroup1Members
RETURN group1.name, group2.name,
toInt(round(100.0 * commonMembers / numberOfGroup1Members)) AS percentage
ORDER BY group1.name, group1.name
Finding overlap as a percentage
Meetup Analytics with R and Neo4j
How many groups are people part of?
MATCH (p:MeetupProfile)-[:MEMBER_OF]->()
RETURN ID(p), COUNT(*) AS groups
ORDER BY groups DESC
How many groups are people part of?
ggplot(aes(x = groups, y = n),
data = group_count %>% count(groups)) +
geom_bar(stat="identity", fill="dark blue") +
scale_y_sqrt() +
scale_x_continuous(
breaks = round(seq(min(group_count$groups),
max(group_count$groups), by = 1),1)) +
ggtitle("Number of groups people are members of")
Meetup Analytics with R and Neo4j
Who’s the most connected?
● i.e. the person who had the chance to meet
the most people in the community
● Betweenness Centrality
● Page Rank
Who’s the most connected?
Betweenness Centrality
Calculates the number of shortest paths that go
through a particular node
Betweenness Centrality
library(igraph)
nodes_query = "MATCH (p:MeetupProfile)-[:RSVPD]->({response: 'yes'})-[:TO]->(event)
RETURN DISTINCT ID(p) AS id, p.id AS name, p.name AS fullName"
nodes = cypher(graph, nodes_query)
edges_query = "MATCH (p:MeetupProfile)-[:RSVPD]->({response: 'yes'})-[:TO]->(event),
(event)<-[:TO]-({response:'yes'})<-[:RSVPD]-(other)
RETURN ID(p) AS source, ID(other) AS target, COUNT(*) AS weight"
edges = cypher(graph, edges_query)
g = graph.data.frame(edges, directed = T, nodes)
bwGraph = betweenness(g)
bwDf = data.frame(id = names(bwGraph), score = bwGraph)
Betweenness Centrality
bwDf %>% arrange(desc(score)) %>% head(5)
merge(nodes, bwDf, by.x = "name", by.y = "id") %>%
arrange(desc(score)) %>%
head(5)
Page Rank
PageRank works by counting the number and quality of
links to a page to determine a rough estimate of how
important the website is.
The underlying assumption is that more important websites
are likely to receive more links from other websites.
Page Rank
PageRank works by counting the number and quality of
links to a person to determine a rough estimate of how
important the person is.
The underlying assumption is that more important people
are likely to receive more links from other people.
Page Rank
pr = page.rank(g)$vector
prDf = data.frame(name = names(pr), rank = pr)
data.frame(merge(nodes, prDf, by.x = "name", by.y = "name")) %>%
arrange(desc(rank)) %>%
head(10)
Blending back into the graph
query = "MATCH (p:MeetupProfile {id: {id}}) SET p.betweenness = {score}"
tx = newTransaction(graph)
for(i in 1:nrow(bwDf)) {
if(i %% 1000 == 0) {
commit(tx)
print(paste("Batch", i / 1000, "committed."))
tx = newTransaction(graph)
}
id = bwDf[i, "id"]
score = bwDf[i, "score"]
appendCypher(tx, query, id = id, score = as.double(score))
}
commit(tx)
Blending back into the graph
query = "MATCH (p:MeetupProfile {id: {id}}) SET p.pageRank = {score}"
tx = newTransaction(graph)
for(i in 1:nrow(prDf)) {
if(i %% 1000 == 0) {
commit(tx)
print(paste("Batch", i / 1000, "committed."))
tx = newTransaction(graph)
}
name = prDf[i, "name"]
rank = prDf[i, "rank"]
appendCypher(tx, query, id = name, score = as.double(rank))
}
commit(tx)
Are they in the Neo4j group?
MATCH (p:MeetupProfile)
WITH p
ORDER BY p.pageRank DESC
LIMIT 20
OPTIONAL MATCH member = (p)-[m:MEMBER_OF]->(g:Group)
WHERE group.name = "Neo4j - London User Group"
RETURN p.name, p.id, p.pageRank, NOT m is null AS isMember
ORDER BY p.pageRank DESC
Are they in the Neo4j group?
blended_data = cypher(graph, query)
Have they been to any events?
Have they been to any events?
MATCH (p:MeetupProfile)
WITH p
ORDER BY p.pageRank DESC
LIMIT 20
OPTIONAL MATCH member = (p)-[m:MEMBER_OF]->(g:Group)
WHERE g.name = "Neo4j - London User Group"
WITH p, NOT m is null AS isMember, g
OPTIONAL MATCH event= (p)-[:RSVPD]-({response:'yes'})-[:TO]->()<-[:HOSTED_EVENT]-(g)
WITH p, isMember, COLLECT(event) as events
RETURN p.name, p.id, p.pageRank, isMember, LENGTH(events) AS events
ORDER BY p.pageRank DESC
Have they been to any events?
blended_data = cypher(graph, query)
Take Aways
● ggplot => visualisations with minimal code
● dplyr => easy data manipulation for
people from other languages
● igraph => find the influencers in a network
● graphs => flexible way of modelling data
that allows querying across multiple
dimensions
And one final take away...
Meetup Analytics with R and Neo4j
http://github.com/mneedham/neo4j-meetup
Get the code

More Related Content

What's hot (13)

PDF
Patterns in Terraform 12+13: Data, Transformations and Resources
Katie Reese
 
DOCX
ggplot2 extensions-ggtree.
Dr. Volkan OBAN
 
PPTX
Text Mining of Twitter in Data Mining
Meghaj Mallick
 
PDF
Montreal Elasticsearch Meetup
Loïc Bertron
 
PDF
Will it Blend? - ScalaSyd February 2015
Filippo Vitale
 
KEY
Thoughts on MongoDB Analytics
rogerbodamer
 
PDF
GraphTalk Stockholm - Fraud Detection with Graphs
Neo4j
 
DOC
Message
Togeiro26
 
PDF
GraphTalk Helsinki - Fraud Analysis with Neo4j
Neo4j
 
PDF
Артём Акуляков - F# for Data Analysis
SpbDotNet Community
 
PDF
10. R getting spatial
ExternalEvents
 
PDF
Distributed DataFrame (DDF) Simplifying Big Data For The Rest Of Us
Arimo, Inc.
 
TXT
php Mailer
Randy Arios
 
Patterns in Terraform 12+13: Data, Transformations and Resources
Katie Reese
 
ggplot2 extensions-ggtree.
Dr. Volkan OBAN
 
Text Mining of Twitter in Data Mining
Meghaj Mallick
 
Montreal Elasticsearch Meetup
Loïc Bertron
 
Will it Blend? - ScalaSyd February 2015
Filippo Vitale
 
Thoughts on MongoDB Analytics
rogerbodamer
 
GraphTalk Stockholm - Fraud Detection with Graphs
Neo4j
 
Message
Togeiro26
 
GraphTalk Helsinki - Fraud Analysis with Neo4j
Neo4j
 
Артём Акуляков - F# for Data Analysis
SpbDotNet Community
 
10. R getting spatial
ExternalEvents
 
Distributed DataFrame (DDF) Simplifying Big Data For The Rest Of Us
Arimo, Inc.
 
php Mailer
Randy Arios
 

Viewers also liked (20)

PDF
GraphDay Noble/Coolio
Neo4j
 
PPTX
GraphConnect 2014 SF: Neo4j at Scale using Enterprise Integration Patterns
Neo4j
 
PDF
GraphConnect 2014 SF: The Business Graph
Neo4j
 
PPTX
Neo4j Makes Graphs Easy
Neo4j
 
PDF
Graph your business
Neo4j
 
PDF
GraphConnect 2014 SF: Betting the Company on a Graph Database - Part 2
Neo4j
 
PDF
Graph all the things
Neo4j
 
PDF
Graph Search and Discovery for your Dark Data
Neo4j
 
PDF
Graph Your Business - GraphDay JimWebber
Neo4j
 
PDF
Neo4j Makes Graphs Easy? - GraphDay AmandaLaucher
Neo4j
 
PPTX
GraphTalk Frankfurt - Master Data Management bei der Bayerischen Versicherung
Neo4j
 
PDF
Metadata and Access Control
Neo4j
 
PDF
Transparency One : La (re)découverte de la chaîne d'approvisionnement
Neo4j
 
PPTX
Graphs fun vjug2
Neo4j
 
PPTX
GraphTalk Frankfurt - Einführung in Graphdatenbanken
Neo4j
 
PDF
GraphTalk - Semantische Netze mit structr
Neo4j
 
PPTX
Graph all the things - PRathle
Neo4j
 
PPTX
GraphTalks - Einführung
Neo4j
 
PPTX
GraphTalk - Semantisches PDM bei Schleich
Neo4j
 
PDF
RDBMS to Graphs
Neo4j
 
GraphDay Noble/Coolio
Neo4j
 
GraphConnect 2014 SF: Neo4j at Scale using Enterprise Integration Patterns
Neo4j
 
GraphConnect 2014 SF: The Business Graph
Neo4j
 
Neo4j Makes Graphs Easy
Neo4j
 
Graph your business
Neo4j
 
GraphConnect 2014 SF: Betting the Company on a Graph Database - Part 2
Neo4j
 
Graph all the things
Neo4j
 
Graph Search and Discovery for your Dark Data
Neo4j
 
Graph Your Business - GraphDay JimWebber
Neo4j
 
Neo4j Makes Graphs Easy? - GraphDay AmandaLaucher
Neo4j
 
GraphTalk Frankfurt - Master Data Management bei der Bayerischen Versicherung
Neo4j
 
Metadata and Access Control
Neo4j
 
Transparency One : La (re)découverte de la chaîne d'approvisionnement
Neo4j
 
Graphs fun vjug2
Neo4j
 
GraphTalk Frankfurt - Einführung in Graphdatenbanken
Neo4j
 
GraphTalk - Semantische Netze mit structr
Neo4j
 
Graph all the things - PRathle
Neo4j
 
GraphTalks - Einführung
Neo4j
 
GraphTalk - Semantisches PDM bei Schleich
Neo4j
 
RDBMS to Graphs
Neo4j
 
Ad

Similar to Meetup Analytics with R and Neo4j (20)

ODP
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
Ioan Toma
 
PDF
Rug hogan-10-03-2012
designandanalytics
 
PDF
Place graphs are the new social graphs
Matt Biddulph
 
PPTX
Getting started with R when analysing GitHub commits
Barbara Fusinska
 
PPTX
R for hadoopers
Gwen (Chen) Shapira
 
PPT
Evolving social data mining and affective analysis
Athena Vakali
 
PDF
RDataMining slides-network-analysis-with-r
Yanchang Zhao
 
PDF
Interactively querying Google Analytics reports from R using ganalytics
Johann de Boer
 
PPTX
Using R for Building a Simple and Effective Dashboard
Andrea Gigli
 
KEY
Visualising data: Seeing is Believing - CS Forum 2012
Richard Ingram
 
PDF
{tidygraph}と{ggraph}によるモダンなネットワーク分析
Takashi Kitano
 
PDF
Lifecycle Inference on Unreliable Event Data
Databricks
 
PPTX
Eventbrite Data Platform Talk foir SFDM
Vipul Sharma
 
PDF
Web analytics using R
Abhishek Agrawal
 
PDF
Science Online 2013: Data Visualization Using R
William Gunn
 
PPTX
20111103 con tech2011-marc smith
Marc Smith
 
PDF
R data mining_clear
sinanspoon
 
PPTX
LSS'11: Charting Collections Of Connections In Social Media
Local Social Summit
 
PDF
Social Web 2014: Final Presentations (Part II)
Lora Aroyo
 
PPTX
The Tidyverse and the Future of the Monitoring Toolchain
John Rauser
 
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
Ioan Toma
 
Rug hogan-10-03-2012
designandanalytics
 
Place graphs are the new social graphs
Matt Biddulph
 
Getting started with R when analysing GitHub commits
Barbara Fusinska
 
R for hadoopers
Gwen (Chen) Shapira
 
Evolving social data mining and affective analysis
Athena Vakali
 
RDataMining slides-network-analysis-with-r
Yanchang Zhao
 
Interactively querying Google Analytics reports from R using ganalytics
Johann de Boer
 
Using R for Building a Simple and Effective Dashboard
Andrea Gigli
 
Visualising data: Seeing is Believing - CS Forum 2012
Richard Ingram
 
{tidygraph}と{ggraph}によるモダンなネットワーク分析
Takashi Kitano
 
Lifecycle Inference on Unreliable Event Data
Databricks
 
Eventbrite Data Platform Talk foir SFDM
Vipul Sharma
 
Web analytics using R
Abhishek Agrawal
 
Science Online 2013: Data Visualization Using R
William Gunn
 
20111103 con tech2011-marc smith
Marc Smith
 
R data mining_clear
sinanspoon
 
LSS'11: Charting Collections Of Connections In Social Media
Local Social Summit
 
Social Web 2014: Final Presentations (Part II)
Lora Aroyo
 
The Tidyverse and the Future of the Monitoring Toolchain
John Rauser
 
Ad

More from Neo4j (20)

PDF
GraphSummit Singapore Master Deck - May 20, 2025
Neo4j
 
PPTX
Graphs & GraphRAG - Essential Ingredients for GenAI
Neo4j
 
PPTX
Neo4j Knowledge for Customer Experience.pptx
Neo4j
 
PPTX
GraphTalk New Zealand - The Art of The Possible.pptx
Neo4j
 
PDF
Neo4j: The Art of the Possible with Graph
Neo4j
 
PDF
Smarter Knowledge Graphs For Public Sector
Neo4j
 
PDF
GraphRAG and Knowledge Graphs Exploring AI's Future
Neo4j
 
PDF
Matinée GenAI & GraphRAG Paris - Décembre 24
Neo4j
 
PDF
ANZ Presentation: GraphSummit Melbourne 2024
Neo4j
 
PDF
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Neo4j
 
PDF
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Neo4j
 
PDF
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Neo4j
 
PDF
Démonstration Digital Twin Building Wire Management
Neo4j
 
PDF
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Neo4j
 
PDF
Démonstration Supply Chain - GraphTalk Paris
Neo4j
 
PDF
The Art of Possible - GraphTalk Paris Opening Session
Neo4j
 
PPTX
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
Neo4j
 
PDF
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
Neo4j
 
PDF
Neo4j Graph Data Modelling Session - GraphTalk
Neo4j
 
PDF
Neo4j: The Art of Possible with Graph Technology
Neo4j
 
GraphSummit Singapore Master Deck - May 20, 2025
Neo4j
 
Graphs & GraphRAG - Essential Ingredients for GenAI
Neo4j
 
Neo4j Knowledge for Customer Experience.pptx
Neo4j
 
GraphTalk New Zealand - The Art of The Possible.pptx
Neo4j
 
Neo4j: The Art of the Possible with Graph
Neo4j
 
Smarter Knowledge Graphs For Public Sector
Neo4j
 
GraphRAG and Knowledge Graphs Exploring AI's Future
Neo4j
 
Matinée GenAI & GraphRAG Paris - Décembre 24
Neo4j
 
ANZ Presentation: GraphSummit Melbourne 2024
Neo4j
 
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Neo4j
 
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Neo4j
 
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Neo4j
 
Démonstration Digital Twin Building Wire Management
Neo4j
 
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Neo4j
 
Démonstration Supply Chain - GraphTalk Paris
Neo4j
 
The Art of Possible - GraphTalk Paris Opening Session
Neo4j
 
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
Neo4j
 
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
Neo4j
 
Neo4j Graph Data Modelling Session - GraphTalk
Neo4j
 
Neo4j: The Art of Possible with Graph Technology
Neo4j
 

Recently uploaded (20)

PDF
WUCHERIA BANCROFTI-converted-compressed.pdf
S.B.P.G. COLLEGE BARAGAON VARANASI
 
PDF
The-Origin- of -Metazoa-vertebrates .ppt
S.B.P.G. COLLEGE BARAGAON VARANASI
 
PPT
Human physiology and digestive system
S.B.P.G. COLLEGE BARAGAON VARANASI
 
PDF
Insect Behaviour : Patterns And Determinants
SheikhArshaqAreeb
 
PDF
Chemokines and Receptors Overview – Key to Immune Cell Signaling
Benjamin Lewis Lewis
 
PPTX
Lamarckism is one of the earliest theories of evolution, proposed before Darw...
Laxman Khatal
 
PDF
2025-06-10 TWDB Agency Updates & Legislative Outcomes
tagdpa
 
PPTX
Envenomation AND ANIMAL BITES DETAILS.pptx
HARISH543351
 
PPT
Cell cycle,cell cycle checkpoint and control
DrMukeshRameshPimpli
 
PPTX
Hypothalamus_nuclei_ structure_functions.pptx
muralinath2
 
PDF
NRRM 330 Dynamic Equlibrium Presentation
Rowan Sales
 
PDF
Pharma Part 1.pdf #pharmacology #pharmacology
hikmatyt01
 
PPTX
Pratik inorganic chemistry silicon based ppt
akshaythaker18
 
PPTX
Quarter 4 - Module 4A -Plate Tectonics-Seismic waves in Earth's Mechanism.pptx
JunimarAggabao
 
PDF
A young gas giant and hidden substructures in a protoplanetary disk
Sérgio Sacani
 
PDF
AI in power generation 1.pdfmmmmmmmmmmmm
VeenaSaravanakumar
 
PPTX
MICROBIOLOGY PART-1 INTRODUCTION .pptx
Mohit Kumar
 
PPTX
Clinical trial monitoring &safety monitoring in clinical trials.2nd Sem m pha...
Mijamadhu
 
PPTX
Akshay tunneling .pptx_20250331_165945_0000.pptx
akshaythaker18
 
PPTX
LESSON 2 PSYCHOSOCIAL DEVELOPMENT.pptx L
JeanCarolColico1
 
WUCHERIA BANCROFTI-converted-compressed.pdf
S.B.P.G. COLLEGE BARAGAON VARANASI
 
The-Origin- of -Metazoa-vertebrates .ppt
S.B.P.G. COLLEGE BARAGAON VARANASI
 
Human physiology and digestive system
S.B.P.G. COLLEGE BARAGAON VARANASI
 
Insect Behaviour : Patterns And Determinants
SheikhArshaqAreeb
 
Chemokines and Receptors Overview – Key to Immune Cell Signaling
Benjamin Lewis Lewis
 
Lamarckism is one of the earliest theories of evolution, proposed before Darw...
Laxman Khatal
 
2025-06-10 TWDB Agency Updates & Legislative Outcomes
tagdpa
 
Envenomation AND ANIMAL BITES DETAILS.pptx
HARISH543351
 
Cell cycle,cell cycle checkpoint and control
DrMukeshRameshPimpli
 
Hypothalamus_nuclei_ structure_functions.pptx
muralinath2
 
NRRM 330 Dynamic Equlibrium Presentation
Rowan Sales
 
Pharma Part 1.pdf #pharmacology #pharmacology
hikmatyt01
 
Pratik inorganic chemistry silicon based ppt
akshaythaker18
 
Quarter 4 - Module 4A -Plate Tectonics-Seismic waves in Earth's Mechanism.pptx
JunimarAggabao
 
A young gas giant and hidden substructures in a protoplanetary disk
Sérgio Sacani
 
AI in power generation 1.pdfmmmmmmmmmmmm
VeenaSaravanakumar
 
MICROBIOLOGY PART-1 INTRODUCTION .pptx
Mohit Kumar
 
Clinical trial monitoring &safety monitoring in clinical trials.2nd Sem m pha...
Mijamadhu
 
Akshay tunneling .pptx_20250331_165945_0000.pptx
akshaythaker18
 
LESSON 2 PSYCHOSOCIAL DEVELOPMENT.pptx L
JeanCarolColico1
 

Meetup Analytics with R and Neo4j

  • 1. Exploring London NoSQL meetups using R Mark Needham @markhneedham
  • 4. Scraper at the ready...
  • 6. Lots of bits of data ● Events ● Members ● Groups ● RSVPs ● Venues ● Topics
  • 9. Interesting questions to ask... ● What day of the week do people go to meetups? ● Where abouts in London are NoSQL meetups held? ● Do people sign up for multiple meetups on the same day? ● Are there common members between groups? ● What topics are people most interested in? ● In which order do people join the NoSQL groups? ● Who are the most connected people on the NoSQL scene?
  • 10. The tool set RNeo4j Results as a data frame Query dplyr ggplot2 igraph ggmap cluster geosphere
  • 11. When do people go to meetups?
  • 12. When do people go to meetups? (g:Group)-[:HOSTED_EVENT]->(event)<-[:TO]- ({response: 'yes'})<-[:RSVPD]-()
  • 13. When do people go to meetups? MATCH (g:Group)-[:HOSTED_EVENT]->(event)<-[:TO]- ({response: 'yes'})<-[:RSVPD]-() WHERE (event.time + event.utc_offset) < timestamp() RETURN g.name, event.time + event.utc_offset AS eventTime, event.announced_at AS announcedAt, event.name, COUNT(*) AS rsvps
  • 14. R Neo4j install.packages("devtools") devtools::install_github("nicolewhite/Rneo4j") library(RNeo4j) graph = startGraph("http://localhost:7474/db/data/") query = "MATCH … RETURN …" cypher(graph, query)
  • 15. Grouping events by month library(dplyr) events %>% group_by(month) %>% summarise(events = n(), count = sum(rsvps), max = max(rsvps)) %>% mutate(ave = count / events) %>% arrange(desc(ave))
  • 16. Grouping events by month ## month events count ave ## 1 November 55 3018 54.87273 ## 2 May 52 2676 51.46154 ## 3 April 58 2964 51.10345 ## 4 June 47 2384 50.72340 ## 5 October 71 3566 50.22535 ## 6 September 59 2860 48.47458 ## 7 February 43 2047 47.60465 ## 8 January 34 1592 46.82353 ## 9 December 24 1056 44.00000 ## 10 March 39 1667 42.74359 ## 11 July 48 1866 38.87500 ## 12 August 34 1023 30.08824
  • 17. Grouping events by day events %>% group_by(day) %>% summarise(events = n(), count = sum(rsvps), max = max(rsvps)) %>% mutate(ave = count / events) %>% arrange(day)
  • 18. Grouping events by day ## day events count ave ## 1 Monday 63 4034 64.03175 ## 2 Tuesday 151 6696 44.34437 ## 3 Wednesday 225 9481 42.13778 ## 4 Thursday 104 5394 51.86538 ## 5 Friday 11 378 34.36364 ## 6 Saturday 10 736 73.60000
  • 19. Some simple bar charts library(ggplot2) g1 = ggplot(aes(x = day, y = ave), data = byDay) + geom_bar(stat="identity", fill="dark blue") + ggtitle("Average attendees by day") g2 = ggplot(aes(x = day, y = count), data = byDay) + geom_bar(stat="identity", fill="dark blue") + ggtitle("Total attendees by day") grid.arrange(g1,g2, ncol = 1)
  • 22. Where do people go to meetups? (g:Group)-[:HOSTED_EVENT]->(event)<-[:TO]- ({response: 'yes'})<-[:RSVPD]-(), (event)-[:HELD_AT]->(venue)
  • 23. Where do people go to meetups? MATCH (g:Group)-[:HOSTED_EVENT]->(event)<-[:TO]- ({response: 'yes'})<-[:RSVPD]-(), (event)-[:HELD_AT]->(venue) WHERE (event.time + event.utc_offset) < timestamp() RETURN g.name, event.time + event.utc_offset AS eventTime, event.announced_at AS announcedAt, event.name, venue.name AS venue, venue.lat AS lat, venue.lon AS lon, COUNT(*) AS rsvps
  • 24. Where do people go to meetups? MATCH (g:Group)-[:HOSTED_EVENT]->(event)<-[:TO]- ({response: 'yes'})<-[:RSVPD]-(), (event)-[:HELD_AT]->(venue) WHERE (event.time + event.utc_offset) < timestamp() RETURN g.name, event.time + event.utc_offset AS eventTime, event.announced_at AS announcedAt, event.name, venue.name AS venue, venue.lat AS lat, venue.lon AS lon, COUNT(*) AS rsvps
  • 25. Where do people go to meetups? byVenue = events %>% count(lat, lon, venue) %>% ungroup() %>% arrange(desc(n)) %>% rename(count = n)
  • 26. Where do people go to meetups? ## lat lon venue count ## 1 51.50256 -0.019379 Skyline Bar at CCT Venues Plus 1 ## 2 51.53373 -0.122340 The Guardian 1 ## 3 51.51289 -0.067163 Erlang Solutions 3 ## 4 51.49146 -0.219424 Novotel - W6 8DR 1 ## 5 51.49311 -0.146531 Google HQ 1 ## 6 51.52655 -0.084219 Look Mum No Hands! 22 ## 7 51.51976 -0.097270 Vibrant Media, 3rd Floor 1 ## 8 51.52303 -0.085178 Mind Candy HQ 2 ## 9 51.51786 -0.109260 ThoughtWorks UK Office 2 ## 10 51.51575 -0.097978 BT Centre 1
  • 27. Where do people go to meetups? library(ggmap) map = get_map(location = 'London', zoom = 12) ggmap(map) + geom_point(aes(x = lon, y = lat, size = count), data = byVenue, col = "red", alpha = 0.8)
  • 29. library(geosphere) library(cluster) clusteramounts = 40 distance.matrix = byVenue %>% select(lon, lat) %>% distm clustersx <- as.hclust(agnes(distance.matrix, diss = T)) byVenue$group <- cutree(clustersx, k=clusteramounts) byVenueClustered = byVenue %>% group_by(group) %>% summarise(meanLat = mean(lat), meanLon = mean(lon), total = sum(count), venues = paste(venue, collapse = ",")) Spatial clustering
  • 30. ## group meanLat meanLon total ## 1 3 51.52349 -0.08506461 123 ## 2 1 51.52443 -0.09919280 89 ## 3 2 51.50547 -0.10325925 62 ## 4 4 51.50794 -0.12714600 55 ## 5 8 51.51671 -0.10028908 19 ## 6 6 51.53655 -0.13798514 18 ## 7 7 51.52159 -0.10934720 18 ## 8 5 51.51155 -0.07004417 13 ## 9 12 51.51459 -0.12314650 13 ## 10 14 51.52129 -0.07588867 10 Spatial clustering
  • 31. ggmap(map) + geom_point(aes(x = meanLon, y = meanLat, size = total), data = byVenueClustered, col = "red", alpha = 0.8) Spatial clustering
  • 33. byVenue %>% filter(group == byVenueClustered$group[1]) What’s going on in Shoreditch?
  • 34. Meetup Group Member Overlap ● Why would we want to know this? ○ Perhaps for joint meetups ○ Topics for future meetups
  • 35. Extracting the data MATCH (group1:Group), (group2:Group) WHERE group1 <> group2 OPTIONAL MATCH p = (group1)<-[:MEMBER_OF]-()-[:MEMBER_OF]->(group2) WITH group1, group2, COLLECT(p) AS paths RETURN group1.name, group2.name, LENGTH(paths) as commonMembers ORDER BY group1.name, group2.name
  • 37. MATCH (group1:Group), (group2:Group) WHERE group1 <> group2 OPTIONAL MATCH (group1)<-[:MEMBER_OF]-(member) WITH group1, group2, COLLECT(member) AS group1Members WITH group1, group2, group1Members, LENGTH(group1Members) AS numberOfGroup1Members UNWIND group1Members AS member OPTIONAL MATCH path = (member)-[:MEMBER_OF]->(group2) WITH group1, group2, COLLECT(path) AS paths, numberOfGroup1Members WITH group1, group2, LENGTH(paths) as commonMembers, numberOfGroup1Members RETURN group1.name, group2.name, toInt(round(100.0 * commonMembers / numberOfGroup1Members)) AS percentage ORDER BY group1.name, group1.name Finding overlap as a percentage
  • 39. How many groups are people part of? MATCH (p:MeetupProfile)-[:MEMBER_OF]->() RETURN ID(p), COUNT(*) AS groups ORDER BY groups DESC
  • 40. How many groups are people part of? ggplot(aes(x = groups, y = n), data = group_count %>% count(groups)) + geom_bar(stat="identity", fill="dark blue") + scale_y_sqrt() + scale_x_continuous( breaks = round(seq(min(group_count$groups), max(group_count$groups), by = 1),1)) + ggtitle("Number of groups people are members of")
  • 42. Who’s the most connected? ● i.e. the person who had the chance to meet the most people in the community ● Betweenness Centrality ● Page Rank
  • 43. Who’s the most connected?
  • 44. Betweenness Centrality Calculates the number of shortest paths that go through a particular node
  • 45. Betweenness Centrality library(igraph) nodes_query = "MATCH (p:MeetupProfile)-[:RSVPD]->({response: 'yes'})-[:TO]->(event) RETURN DISTINCT ID(p) AS id, p.id AS name, p.name AS fullName" nodes = cypher(graph, nodes_query) edges_query = "MATCH (p:MeetupProfile)-[:RSVPD]->({response: 'yes'})-[:TO]->(event), (event)<-[:TO]-({response:'yes'})<-[:RSVPD]-(other) RETURN ID(p) AS source, ID(other) AS target, COUNT(*) AS weight" edges = cypher(graph, edges_query) g = graph.data.frame(edges, directed = T, nodes) bwGraph = betweenness(g) bwDf = data.frame(id = names(bwGraph), score = bwGraph)
  • 46. Betweenness Centrality bwDf %>% arrange(desc(score)) %>% head(5) merge(nodes, bwDf, by.x = "name", by.y = "id") %>% arrange(desc(score)) %>% head(5)
  • 47. Page Rank PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.
  • 48. Page Rank PageRank works by counting the number and quality of links to a person to determine a rough estimate of how important the person is. The underlying assumption is that more important people are likely to receive more links from other people.
  • 49. Page Rank pr = page.rank(g)$vector prDf = data.frame(name = names(pr), rank = pr) data.frame(merge(nodes, prDf, by.x = "name", by.y = "name")) %>% arrange(desc(rank)) %>% head(10)
  • 50. Blending back into the graph query = "MATCH (p:MeetupProfile {id: {id}}) SET p.betweenness = {score}" tx = newTransaction(graph) for(i in 1:nrow(bwDf)) { if(i %% 1000 == 0) { commit(tx) print(paste("Batch", i / 1000, "committed.")) tx = newTransaction(graph) } id = bwDf[i, "id"] score = bwDf[i, "score"] appendCypher(tx, query, id = id, score = as.double(score)) } commit(tx)
  • 51. Blending back into the graph query = "MATCH (p:MeetupProfile {id: {id}}) SET p.pageRank = {score}" tx = newTransaction(graph) for(i in 1:nrow(prDf)) { if(i %% 1000 == 0) { commit(tx) print(paste("Batch", i / 1000, "committed.")) tx = newTransaction(graph) } name = prDf[i, "name"] rank = prDf[i, "rank"] appendCypher(tx, query, id = name, score = as.double(rank)) } commit(tx)
  • 52. Are they in the Neo4j group? MATCH (p:MeetupProfile) WITH p ORDER BY p.pageRank DESC LIMIT 20 OPTIONAL MATCH member = (p)-[m:MEMBER_OF]->(g:Group) WHERE group.name = "Neo4j - London User Group" RETURN p.name, p.id, p.pageRank, NOT m is null AS isMember ORDER BY p.pageRank DESC
  • 53. Are they in the Neo4j group? blended_data = cypher(graph, query)
  • 54. Have they been to any events?
  • 55. Have they been to any events? MATCH (p:MeetupProfile) WITH p ORDER BY p.pageRank DESC LIMIT 20 OPTIONAL MATCH member = (p)-[m:MEMBER_OF]->(g:Group) WHERE g.name = "Neo4j - London User Group" WITH p, NOT m is null AS isMember, g OPTIONAL MATCH event= (p)-[:RSVPD]-({response:'yes'})-[:TO]->()<-[:HOSTED_EVENT]-(g) WITH p, isMember, COLLECT(event) as events RETURN p.name, p.id, p.pageRank, isMember, LENGTH(events) AS events ORDER BY p.pageRank DESC
  • 56. Have they been to any events? blended_data = cypher(graph, query)
  • 57. Take Aways ● ggplot => visualisations with minimal code ● dplyr => easy data manipulation for people from other languages ● igraph => find the influencers in a network ● graphs => flexible way of modelling data that allows querying across multiple dimensions
  • 58. And one final take away...