Neo4j
After 1 year in production
with Andrey Nikishaev
What we will talk about today
Neo4j internals
Cypher - query language
Extensions developing
Neo4j in production
Conclusion
Data
Properties
Linked lists of properties records. Key:Value in
each.
Node
Refers to its first Property & first node in its
relationship chain.
Relationship
Refers to its first Property & Start and End Nodes.
Also it refers to Prev/Next Relationship of its
Start/End Nodes.
All data in Neo4j is Linked lists with fixed size records.
● ID lookup = O(1)
● It's great at localized searches. E.g. to get the
people you follow.
● It's not great at aggregation. E.g. the nodes or
relationships aren't stored in any sorted order,
so deriving the 20 most popular users
requires a full scan.
● It suffers from the "supernode problem". At
least currently, a node's neighboring
relationships are stored as a flat list, so if you
have a million followers, fetching even one
person you follow is slow.
Caching
File Cache
Blocks of the same size.
Map blocks with OS Mmap to memory.
Evicts data by LFU policy
(hits vs misses).
Object Cache (removed in v2.3+)
Saves serialized data to memory to boost
queries.
No eviction policy (can eat all your memory)
Evicted only on transaction log sync(HA) or data
deletion.
To use it you should warm it up with query like
this:
MATCH (n)
OPTIONAL MATCH (n)-[r]->()
RETURN count(n.prop) + count(r.prop);
Transactions
As a context Tx using Thread Local Object.
Gathering lists of
commands
Sorting commands
(predictable
execution order)
Write commands to
Tx log
Mark Tx in log as
finished
Write to DB
Tx Log
Tx ID
Transactions
As a context Tx using Thread Local Object.
Gathering lists of
commands
Sorting commands
(predictable
execution order)
Write commands to
Tx log
Mark Tx in log as
finished
Write to DB
Tx Log
Tx ID
HA
Only Master-Slave replication
● Sync every N time (configurable).
● All writes only through the master. Writes on slave would
be done slower.
● Same Node/Rels IDs on all servers.
● Needs quorum for write else read-only mode.
● IDs allocated by blocks.
● Master elects by this rules:
○ Highest Tx ID.
○ If multiple: instance that was master for this Tx.
○ If unavailable: instance with the lowest clock value.
○ If multiple: instance with the lowest ID.
Cypher
MATCH (girl: Girl)
WHERE girl.age > 18 AND girl.age < 25
AND (
NOT (girl)-[:HAS_BOYFRIEND]->(some_dick: Guy) )
OR NOT (girl)-[:HAS_BOYFRIEND]->(pussy: Guy)-[:ENGAGED_IN]->(gym: Gym)
)
RETURN girl
ORDER BY girl.age ASC
Cypher
No query watcher
You should control each query that goes to a server, because a query can kill the server.
Read all data first
When you engage with properties(extend operation) data gets cached in memory, if it does not fit there
then query will crash(or even the server). Evan MATCH (n) DELETE n will fail if you have many nodes.
Locking
Making an update query doesn’t mean that you set an update lock, even in a transaction.
MATCH (n:Node)
SET n.count = n.count + 1
MATCH (n:Node)
SET n._lock = true
SET n.count = n.count + 1
FAIL PASS
More about this at: http://goo.gl/Cy3MEU
Cypher
You can try it on real data for free here: https://neo4j.com/sandbox-v2/
Similarity example.
Used recommendation dataset: 32314 Nodes, 332622 Relations
Top 25 similar users:
MATCH
(u1:User)-[:RATED]->(:Movie)<-[:RATED]-(u2:User)
return [u1.name,u2.name] as pairs, count(*) as cnt
order by cnt desc
limit 25
Run time: 16366 ms. Number of pairs: 6 246 674
Most queries will not work
without warming up.
Use Indexes as much as
possible.
Cypher
> Sushi restaurants in New York that my friends like.
MATCH (person:Person)-[:IS_FRIEND_OF]->(friend),
(friend)-[:LIKES]->(restaurant:Restaurant),
(restaurant)-[:LOCATED_IN]->(loc:Location),
(restaurant)-[:SERVES]->(type:Cuisine)
WHERE person.name = 'Philip'
AND loc.location = 'New York'
AND type.cuisine = 'Sushi'
RETURN restaurant.name, count(*) AS occurrence
ORDER BY occurrence DESC
LIMIT 5
https://neo4j.com/developer/guide-build-a-recommendation-engine/
Extensions developing
User-Defined Procedures & Functions
Same as in SQL DBs
Unmanaged server extensions
Extensions that can create new API to work with Neo4j. You can even create
new Dashboard.
Server plugins
Extensions that only can extend Neo4j Core API.
Kernel extensions
Here you can do almost anything.
https://github.com/creotiv/neo4j-kernel-plugin-example
User-Defined Procedures & Functions (v3.0+ only)
public class Join
{
@UserFunction
@Description("example.join(['s1','s2',...], delimiter) - join the given strings with the
given delimiter.")
public String join(
@Name("strings") List<String> strings,
@Name(value = "delimiter", defaultValue = ",") String delimiter) {
if (strings == null || delimiter == null) {
return null;
}
return String.join(delimiter, strings);
}
}
Calling:
MATCH (p: Person)
WHERE p.age = 36
RETURN org.neo4j.examples.join(collect(p.names))
Unmanaged extensions
@Path("/helloworld")
public class HelloWorldResource {
private final GraphDatabaseService database;
public HelloWorldResource(@Context GraphDatabaseService database) {
this.database = database;
}
@GET
@Produces(MediaType.TEXT_PLAIN)
@Path("/{nodeId}")
public Response hello(@PathParam("nodeId") long nodeId) {
return Response.status(Status.OK).entity(
UTF8.encode("Hello World, nodeId=" + nodeId)).build();
}
}
Kernel extensions - Factory
public class ExampleKernelExtensionFactory extends KernelExtensionFactory<ExampleKernelExtensionFactory.Dependencies> {
public static abstract class ExampleSettings {
public static Setting<Boolean> debug = setting("examplekernelextension.debug", BOOLEAN, Settings.FALSE);
}
public ExampleKernelExtensionFactory() {super(SERVICE_NAME);}
@Override
public Lifecycle newKernelExtension(Dependencies dependencies) throws Throwable {
Config config = dependencies.getConfig();
return new ExampleExtension(dependencies.getGraphDatabaseService(), config.get(ExampleSettings.debug), ...);
}
public interface Dependencies {
GraphDatabaseService getGraphDatabaseService();
Config getConfig();
}
}
Kernel extensions - Extension
public class ExampleExtension implements Lifecycle {
...
public ExampleExtension(GraphDatabaseService gds, Boolean debug, String somevar) {
this.gds = gds;
this.debug = debug;
this.somevar = somevar;
}
@Override
public void init() throws Throwable {
handler = new ExampleEventHandler(gds, debug, somevar);
gds.registerTransactionEventHandler(handler);
}
... Start/Stop methods ...
@Override
public void shutdown() throws Throwable {
gds.unregisterTransactionEventHandler(handler);
}
}
Kernel extensions - Event Handler
class ExampleEventHandler implements TransactionEventHandler<String> {
...
@Override
public String beforeCommit(TransactionData transactionData) throws Exception {
updateConstraints();
return prepareCreatedNodes(transactionData);
}
@Override
public void afterCommit(TransactionData transactionData, String result) {
processCreatedNodes(result);
}
@Override
public void afterRollback(TransactionData transactionData, String result) {
error("Something bad happend, Harry: " + result);
}
}
Kernel extensions - Event Handler
Problems
beforeCommit (which should be run when DB is not changed)
You can’t access deleted nodes params, labels, relations, because they are already deleted. Yeah..
strange. So you need to gather them from events data.
afterCommit (which should be run after transaction committed and closed)
Executed when transaction is still opened, which will lead to deadlock(without any info and exception) if
you try to update your local db.
Local DB
- Bad API.
- You can’t access to the HA status of the local server, need to run requests through REST API.
- No way to access user request.
- Plugins can conflict with each other and cause deadlocks.
Neo4j in Production
Neo4j in Production - Cache-Based Sharding
Cache A Cache B Cache C
Router
Neo4j in Production - Settings
Log slow queries
dbms.querylog.enabled=true
dbms.querylog.threshold=4s
Logical logs for debug
keep_logical_logs=7 days
Enable online backup
online_backup_enabled=true
online_backup_server=127.0.0.1:6362
Number of threads (for concurrent access)
org.neo4j.server.webserver.maxthreads=64
(default number of CPUs)
Memory used for page cache
dbms.pagecache.memory=2g
Time of pulling updates from master
ha.pull_interval=10 (seconds)
Without timeout replication
Number of slaves to which Tx will be pushed
upon commit on master.(Optimistic - can mark Tx
success even if some pushes failed)
ha.tx_push_factor=1
Push strategy
Fixed push Txs based on server id order.
ha.tx_push_strategy=fixed|round_robin
Master to slave communication chunk size
ha.com_chunk_size=2M
Maximum number of connections a slave can have
to the master
ha.max_concurrent_channels_per_slave=20
http://neo4j.com/docs/stable/ha-configuration.html
Neo4j in Production - Performance
Use SSD
It is much cheaper than 16-32Gb RAM
IO tunning
Disable file and dir access time updates.
Set deadline scheduler for disk operations. This will increase read
speed but decrease write speed.
$ echo 'deadline' > /sys/block/sda/queue/scheduler
$ cat /sys/block/sda/queue/scheduler
Memory tunning
Set dbms.pagecache.memory to the size of *store*.db files +
20-40% for growth.
Leave some memory for OS
OS Memory = 1GB + (size of graph.db/index) + (size of
graph.db/schema)
If you see swapping then increase OS memory size.
JVM tunning
Set dbms.memory.heap.initial_size and
dbms.memory.heap.max_size to the same size to avoid
unwanted full garbage collection pauses.
Use concurrent Garbage Collector -XX:+UseG1GC
Set old/new generation ration -XX:NewRatio=N (1
minimum. calculated like old/new = ratio)
The more data updated in Txs the lower ratio you need.
Neo4j in Production - Problems
- Based on Java
- Not stable
- Problems with memory use and control
- No control over queries
- Problems with some silly queries like “delete all”
- No sharding
- No DC - replication
- No master-master replication
- Query planning is a mystery
- Can’t work without big amount of memory
- Dashboard shows unreal execution time
- Hell with plugin deployment
- Problems with data loss on master
death
- Problems with not synced data
during requests.
- Coming soon ...
Conclusion
70/30
Thank You!
User Stories: https://neo4j.com/case-studies/
Free Sand box with data: https://neo4j.com/sandbox-v2/
Kernel extension example https://github.com/creotiv/neo4j-kernel-plugin-example
Advanced locking: http://goo.gl/Cy3MEU
HA configuration: http://neo4j.com/docs/stable/ha-configuration.html
Andrey Nikishaev
creotiv@gmail.com
fb.me/anikishaev

More Related Content

PPTX
It's 10pm: Do You Know Where Your Writes Are?
PDF
Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...
PPTX
MongoDB-SESSION03
PDF
Diagnostics & Debugging webinar
PDF
MongoDB World 2019: RDBMS Versus MongoDB Aggregation Performance
PDF
Gnocchi Profiling v2
PDF
MongoDB Performance Tuning
PDF
MongoDB Drivers And High Availability: Deep Dive
It's 10pm: Do You Know Where Your Writes Are?
Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...
MongoDB-SESSION03
Diagnostics & Debugging webinar
MongoDB World 2019: RDBMS Versus MongoDB Aggregation Performance
Gnocchi Profiling v2
MongoDB Performance Tuning
MongoDB Drivers And High Availability: Deep Dive

What's hot (20)

PDF
Deep dive into PostgreSQL statistics.
PPTX
Replication and replica sets
PDF
Cassandra summit 2013 - DataStax Java Driver Unleashed!
PDF
Being closer to Cassandra by Oleg Anastasyev. Talk at Cassandra Summit EU 2013
PDF
Gnocchi Profiling 2.1.x
PDF
MongoDB: Optimising for Performance, Scale & Analytics
PPTX
Back to Basics Spanish Webinar 3 - Introducción a los replica sets
PDF
Full Text Search in PostgreSQL
PDF
Troubleshooting PostgreSQL with pgCenter
PDF
MongoDB Database Replication
PDF
glance replicator
PDF
Non-Relational Postgres / Bruce Momjian (EnterpriseDB)
PDF
pg / shardman: шардинг в PostgreSQL на основе postgres / fdw, pg / pathman и ...
PPTX
Don't dump thread dumps
PDF
Better Full Text Search in PostgreSQL
PDF
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
PDF
Drivers APIs and Looking Forward
PDF
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
PDF
Cassandra
PPTX
HBaseCon 2013: OpenTSDB at Box
Deep dive into PostgreSQL statistics.
Replication and replica sets
Cassandra summit 2013 - DataStax Java Driver Unleashed!
Being closer to Cassandra by Oleg Anastasyev. Talk at Cassandra Summit EU 2013
Gnocchi Profiling 2.1.x
MongoDB: Optimising for Performance, Scale & Analytics
Back to Basics Spanish Webinar 3 - Introducción a los replica sets
Full Text Search in PostgreSQL
Troubleshooting PostgreSQL with pgCenter
MongoDB Database Replication
glance replicator
Non-Relational Postgres / Bruce Momjian (EnterpriseDB)
pg / shardman: шардинг в PostgreSQL на основе postgres / fdw, pg / pathman и ...
Don't dump thread dumps
Better Full Text Search in PostgreSQL
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Drivers APIs and Looking Forward
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
Cassandra
HBaseCon 2013: OpenTSDB at Box
Ad

Similar to Neo4j after 1 year in production (20)

PDF
Getting started with Graph Databases & Neo4j
PDF
DriverPack Solution Download Full ISO free
PDF
Atlantis Word Processor 4.4.5.1 Free Download
PDF
Adobe After Effects 2025 v25.1.0 Free Download
PDF
iTop VPN Crack 6.3.3 serial Key Free 2025
PDF
Neo4j Vision and Roadmap
PDF
Neo4j Training Cypher
PPTX
GraphConnect 2014 SF: Neo4j at Scale using Enterprise Integration Patterns
PDF
026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...
PPTX
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
PDF
Neo4j: Graph-like power
PDF
03 introduction to graph databases
PPTX
Neo4j graph database
PDF
3rd Athens Big Data Meetup - 2nd Talk - Neo4j: The World's Leading Graph DB
PDF
Leveraging the Power of Graph Databases in PHP
PDF
Introduction to Graph databases and Neo4j (by Stefan Armbruster)
PDF
Neo4j Introduction Workshop for Partners
PDF
Neo4j (Part 1)
PDF
Neo4j Stored Procedure Training Part 1
Getting started with Graph Databases & Neo4j
DriverPack Solution Download Full ISO free
Atlantis Word Processor 4.4.5.1 Free Download
Adobe After Effects 2025 v25.1.0 Free Download
iTop VPN Crack 6.3.3 serial Key Free 2025
Neo4j Vision and Roadmap
Neo4j Training Cypher
GraphConnect 2014 SF: Neo4j at Scale using Enterprise Integration Patterns
026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
Neo4j: Graph-like power
03 introduction to graph databases
Neo4j graph database
3rd Athens Big Data Meetup - 2nd Talk - Neo4j: The World's Leading Graph DB
Leveraging the Power of Graph Databases in PHP
Introduction to Graph databases and Neo4j (by Stefan Armbruster)
Neo4j Introduction Workshop for Partners
Neo4j (Part 1)
Neo4j Stored Procedure Training Part 1
Ad

More from Andrew Nikishaev (10)

PDF
What is ML and how it can be used in sport
PDF
Photo echance. Problems. Solutions. Ideas
PDF
Crypto trading - the basics
PDF
Machine learning for newbies
PDF
Ideal pitch - for investors and clients
PDF
От идеи до рабочей MVP
PDF
Sit&fit - uderdesk stepper trainer with charger
PPTX
Тонкости работы с Facebook
PPTX
Построение Business Model Canvas и Value Proposition Canvas
PPTX
Нетворкинг и Социальная Инженерия
What is ML and how it can be used in sport
Photo echance. Problems. Solutions. Ideas
Crypto trading - the basics
Machine learning for newbies
Ideal pitch - for investors and clients
От идеи до рабочей MVP
Sit&fit - uderdesk stepper trainer with charger
Тонкости работы с Facebook
Построение Business Model Canvas и Value Proposition Canvas
Нетворкинг и Социальная Инженерия

Recently uploaded (20)

DOC
UTEP毕业证学历认证,宾夕法尼亚克拉里恩大学毕业证未毕业
PPTX
Viber For Windows 25.7.1 Crack + Serial Keygen
PDF
Sun and Bloombase Spitfire StoreSafe End-to-end Storage Security Solution
PDF
Workplace Software and Skills - OpenStax
PDF
Guide to Food Delivery App Development.pdf
PPTX
Computer Software - Technology and Livelihood Education
PDF
BoxLang Dynamic AWS Lambda - Japan Edition
PPTX
Airline CRS | Airline CRS Systems | CRS System
PDF
infoteam HELLAS company profile 2025 presentation
PDF
Introduction to Ragic - #1 No Code Tool For Digitalizing Your Business Proces...
PDF
novaPDF Pro 11.9.482 Crack + License Key [Latest 2025]
PPTX
Plex Media Server 1.28.2.6151 With Crac5 2022 Free .
PPTX
Bista Solutions Advanced Accounting Package
PDF
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
PDF
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
PPTX
R-Studio Crack Free Download 2025 Latest
PDF
Practical Indispensable Project Management Tips for Delivering Successful Exp...
PDF
How Tridens DevSecOps Ensures Compliance, Security, and Agility
PDF
CCleaner 6.39.11548 Crack 2025 License Key
PDF
Visual explanation of Dijkstra's Algorithm using Python
UTEP毕业证学历认证,宾夕法尼亚克拉里恩大学毕业证未毕业
Viber For Windows 25.7.1 Crack + Serial Keygen
Sun and Bloombase Spitfire StoreSafe End-to-end Storage Security Solution
Workplace Software and Skills - OpenStax
Guide to Food Delivery App Development.pdf
Computer Software - Technology and Livelihood Education
BoxLang Dynamic AWS Lambda - Japan Edition
Airline CRS | Airline CRS Systems | CRS System
infoteam HELLAS company profile 2025 presentation
Introduction to Ragic - #1 No Code Tool For Digitalizing Your Business Proces...
novaPDF Pro 11.9.482 Crack + License Key [Latest 2025]
Plex Media Server 1.28.2.6151 With Crac5 2022 Free .
Bista Solutions Advanced Accounting Package
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
R-Studio Crack Free Download 2025 Latest
Practical Indispensable Project Management Tips for Delivering Successful Exp...
How Tridens DevSecOps Ensures Compliance, Security, and Agility
CCleaner 6.39.11548 Crack 2025 License Key
Visual explanation of Dijkstra's Algorithm using Python

Neo4j after 1 year in production

  • 1. Neo4j After 1 year in production with Andrey Nikishaev
  • 2. What we will talk about today Neo4j internals Cypher - query language Extensions developing Neo4j in production Conclusion
  • 3. Data Properties Linked lists of properties records. Key:Value in each. Node Refers to its first Property & first node in its relationship chain. Relationship Refers to its first Property & Start and End Nodes. Also it refers to Prev/Next Relationship of its Start/End Nodes. All data in Neo4j is Linked lists with fixed size records. ● ID lookup = O(1) ● It's great at localized searches. E.g. to get the people you follow. ● It's not great at aggregation. E.g. the nodes or relationships aren't stored in any sorted order, so deriving the 20 most popular users requires a full scan. ● It suffers from the "supernode problem". At least currently, a node's neighboring relationships are stored as a flat list, so if you have a million followers, fetching even one person you follow is slow.
  • 4. Caching File Cache Blocks of the same size. Map blocks with OS Mmap to memory. Evicts data by LFU policy (hits vs misses). Object Cache (removed in v2.3+) Saves serialized data to memory to boost queries. No eviction policy (can eat all your memory) Evicted only on transaction log sync(HA) or data deletion. To use it you should warm it up with query like this: MATCH (n) OPTIONAL MATCH (n)-[r]->() RETURN count(n.prop) + count(r.prop);
  • 5. Transactions As a context Tx using Thread Local Object. Gathering lists of commands Sorting commands (predictable execution order) Write commands to Tx log Mark Tx in log as finished Write to DB Tx Log Tx ID
  • 6. Transactions As a context Tx using Thread Local Object. Gathering lists of commands Sorting commands (predictable execution order) Write commands to Tx log Mark Tx in log as finished Write to DB Tx Log Tx ID
  • 7. HA Only Master-Slave replication ● Sync every N time (configurable). ● All writes only through the master. Writes on slave would be done slower. ● Same Node/Rels IDs on all servers. ● Needs quorum for write else read-only mode. ● IDs allocated by blocks. ● Master elects by this rules: ○ Highest Tx ID. ○ If multiple: instance that was master for this Tx. ○ If unavailable: instance with the lowest clock value. ○ If multiple: instance with the lowest ID.
  • 8. Cypher MATCH (girl: Girl) WHERE girl.age > 18 AND girl.age < 25 AND ( NOT (girl)-[:HAS_BOYFRIEND]->(some_dick: Guy) ) OR NOT (girl)-[:HAS_BOYFRIEND]->(pussy: Guy)-[:ENGAGED_IN]->(gym: Gym) ) RETURN girl ORDER BY girl.age ASC
  • 9. Cypher No query watcher You should control each query that goes to a server, because a query can kill the server. Read all data first When you engage with properties(extend operation) data gets cached in memory, if it does not fit there then query will crash(or even the server). Evan MATCH (n) DELETE n will fail if you have many nodes. Locking Making an update query doesn’t mean that you set an update lock, even in a transaction. MATCH (n:Node) SET n.count = n.count + 1 MATCH (n:Node) SET n._lock = true SET n.count = n.count + 1 FAIL PASS More about this at: http://goo.gl/Cy3MEU
  • 10. Cypher You can try it on real data for free here: https://neo4j.com/sandbox-v2/ Similarity example. Used recommendation dataset: 32314 Nodes, 332622 Relations Top 25 similar users: MATCH (u1:User)-[:RATED]->(:Movie)<-[:RATED]-(u2:User) return [u1.name,u2.name] as pairs, count(*) as cnt order by cnt desc limit 25 Run time: 16366 ms. Number of pairs: 6 246 674 Most queries will not work without warming up. Use Indexes as much as possible.
  • 11. Cypher > Sushi restaurants in New York that my friends like. MATCH (person:Person)-[:IS_FRIEND_OF]->(friend), (friend)-[:LIKES]->(restaurant:Restaurant), (restaurant)-[:LOCATED_IN]->(loc:Location), (restaurant)-[:SERVES]->(type:Cuisine) WHERE person.name = 'Philip' AND loc.location = 'New York' AND type.cuisine = 'Sushi' RETURN restaurant.name, count(*) AS occurrence ORDER BY occurrence DESC LIMIT 5 https://neo4j.com/developer/guide-build-a-recommendation-engine/
  • 12. Extensions developing User-Defined Procedures & Functions Same as in SQL DBs Unmanaged server extensions Extensions that can create new API to work with Neo4j. You can even create new Dashboard. Server plugins Extensions that only can extend Neo4j Core API. Kernel extensions Here you can do almost anything. https://github.com/creotiv/neo4j-kernel-plugin-example
  • 13. User-Defined Procedures & Functions (v3.0+ only) public class Join { @UserFunction @Description("example.join(['s1','s2',...], delimiter) - join the given strings with the given delimiter.") public String join( @Name("strings") List<String> strings, @Name(value = "delimiter", defaultValue = ",") String delimiter) { if (strings == null || delimiter == null) { return null; } return String.join(delimiter, strings); } } Calling: MATCH (p: Person) WHERE p.age = 36 RETURN org.neo4j.examples.join(collect(p.names))
  • 14. Unmanaged extensions @Path("/helloworld") public class HelloWorldResource { private final GraphDatabaseService database; public HelloWorldResource(@Context GraphDatabaseService database) { this.database = database; } @GET @Produces(MediaType.TEXT_PLAIN) @Path("/{nodeId}") public Response hello(@PathParam("nodeId") long nodeId) { return Response.status(Status.OK).entity( UTF8.encode("Hello World, nodeId=" + nodeId)).build(); } }
  • 15. Kernel extensions - Factory public class ExampleKernelExtensionFactory extends KernelExtensionFactory<ExampleKernelExtensionFactory.Dependencies> { public static abstract class ExampleSettings { public static Setting<Boolean> debug = setting("examplekernelextension.debug", BOOLEAN, Settings.FALSE); } public ExampleKernelExtensionFactory() {super(SERVICE_NAME);} @Override public Lifecycle newKernelExtension(Dependencies dependencies) throws Throwable { Config config = dependencies.getConfig(); return new ExampleExtension(dependencies.getGraphDatabaseService(), config.get(ExampleSettings.debug), ...); } public interface Dependencies { GraphDatabaseService getGraphDatabaseService(); Config getConfig(); } }
  • 16. Kernel extensions - Extension public class ExampleExtension implements Lifecycle { ... public ExampleExtension(GraphDatabaseService gds, Boolean debug, String somevar) { this.gds = gds; this.debug = debug; this.somevar = somevar; } @Override public void init() throws Throwable { handler = new ExampleEventHandler(gds, debug, somevar); gds.registerTransactionEventHandler(handler); } ... Start/Stop methods ... @Override public void shutdown() throws Throwable { gds.unregisterTransactionEventHandler(handler); } }
  • 17. Kernel extensions - Event Handler class ExampleEventHandler implements TransactionEventHandler<String> { ... @Override public String beforeCommit(TransactionData transactionData) throws Exception { updateConstraints(); return prepareCreatedNodes(transactionData); } @Override public void afterCommit(TransactionData transactionData, String result) { processCreatedNodes(result); } @Override public void afterRollback(TransactionData transactionData, String result) { error("Something bad happend, Harry: " + result); } }
  • 18. Kernel extensions - Event Handler Problems beforeCommit (which should be run when DB is not changed) You can’t access deleted nodes params, labels, relations, because they are already deleted. Yeah.. strange. So you need to gather them from events data. afterCommit (which should be run after transaction committed and closed) Executed when transaction is still opened, which will lead to deadlock(without any info and exception) if you try to update your local db. Local DB - Bad API. - You can’t access to the HA status of the local server, need to run requests through REST API. - No way to access user request. - Plugins can conflict with each other and cause deadlocks.
  • 20. Neo4j in Production - Cache-Based Sharding Cache A Cache B Cache C Router
  • 21. Neo4j in Production - Settings Log slow queries dbms.querylog.enabled=true dbms.querylog.threshold=4s Logical logs for debug keep_logical_logs=7 days Enable online backup online_backup_enabled=true online_backup_server=127.0.0.1:6362 Number of threads (for concurrent access) org.neo4j.server.webserver.maxthreads=64 (default number of CPUs) Memory used for page cache dbms.pagecache.memory=2g Time of pulling updates from master ha.pull_interval=10 (seconds) Without timeout replication Number of slaves to which Tx will be pushed upon commit on master.(Optimistic - can mark Tx success even if some pushes failed) ha.tx_push_factor=1 Push strategy Fixed push Txs based on server id order. ha.tx_push_strategy=fixed|round_robin Master to slave communication chunk size ha.com_chunk_size=2M Maximum number of connections a slave can have to the master ha.max_concurrent_channels_per_slave=20 http://neo4j.com/docs/stable/ha-configuration.html
  • 22. Neo4j in Production - Performance Use SSD It is much cheaper than 16-32Gb RAM IO tunning Disable file and dir access time updates. Set deadline scheduler for disk operations. This will increase read speed but decrease write speed. $ echo 'deadline' > /sys/block/sda/queue/scheduler $ cat /sys/block/sda/queue/scheduler Memory tunning Set dbms.pagecache.memory to the size of *store*.db files + 20-40% for growth. Leave some memory for OS OS Memory = 1GB + (size of graph.db/index) + (size of graph.db/schema) If you see swapping then increase OS memory size. JVM tunning Set dbms.memory.heap.initial_size and dbms.memory.heap.max_size to the same size to avoid unwanted full garbage collection pauses. Use concurrent Garbage Collector -XX:+UseG1GC Set old/new generation ration -XX:NewRatio=N (1 minimum. calculated like old/new = ratio) The more data updated in Txs the lower ratio you need.
  • 23. Neo4j in Production - Problems - Based on Java - Not stable - Problems with memory use and control - No control over queries - Problems with some silly queries like “delete all” - No sharding - No DC - replication - No master-master replication - Query planning is a mystery - Can’t work without big amount of memory - Dashboard shows unreal execution time - Hell with plugin deployment - Problems with data loss on master death - Problems with not synced data during requests. - Coming soon ...
  • 25. Thank You! User Stories: https://neo4j.com/case-studies/ Free Sand box with data: https://neo4j.com/sandbox-v2/ Kernel extension example https://github.com/creotiv/neo4j-kernel-plugin-example Advanced locking: http://goo.gl/Cy3MEU HA configuration: http://neo4j.com/docs/stable/ha-configuration.html Andrey Nikishaev [email protected] fb.me/anikishaev