SlideShare a Scribd company logo
Graph Databases
             and Neo4j
                          twitter: @thobe / #neo4j
Tobias Ivarsson           email: tobias@neotechnology.com
                          web: http://www.neo4j.org/
Hacker @ Neo Technology   web: http://www.thobe.org/
NOSQL - Why now?
    Four trends


                  2
Trend 1: Data size
               ExaBytes (10¹⁸) of data stored per year
                                                             988
1000
         Each year more and
         more digital data is
         created. Over t wo
 750     years we create more
         digital data than all                623
         the data created in
         history before that.
 500
                                  397

                            253
 250    161


   0
       2006                2007   2008        2009           2010
                                     Data source: IDC 2007     3
Trend 2: Connectedness
                                                                                                                    Giant
                                                                                                                    Global
                                                                                                                 Graph (GGG)


                                    Over time data has evolved to                                   Ontologies
                                    be more and more interlinked
                                    and connected.
                                                                                           RDF
                                    Hypertext has links,
                                    Blogs have pingback,
                                    Tagging groups all related data                                       Folksonomies
  Information connectivity




                                                                                        Tagging


                                                                        Wikis            User-generated
                                                                                            content
                                                                                Blogs


                                                                      RSS


                                                  Hypertext


                         Text documents
                                                         web 1.0                  web 2.0                        “web 3.0”

                                             1990                     2000                        2010                   2020   4
Trend 3: Semi-structure
๏ Individualization of content
   • In the salary lists of the 1970s, all elements had exactly one job
   • In Or 15? lists of the 2000s, we need 5 job columns! Or 8?
        the salary


๏ All encompassing “entire world views”
   • Store more data about each entity
๏ Trend accelerated by the decentralization of content generation
     that is the hallmark of the age of participation (“web 2.0”)



                                                                    5
Trend 4: Architecture

              1980s: Mainframe applications


                       Application




                           DB




                                              6
Trend 4: Architecture

             1990s: Database as integration hub


          Application   Application    Application




                            DB




                                                     7
Trend 4: Architecture

         2000s: (moving towards) Decoupled services
                        with their own backend

          Application       Application          Application




              DB                 DB                  DB




                                                               8
Why NOSQL Now?

๏Trend 1: Size
๏Trend 2: Connectedness
๏Trend 3: Semi-structure
๏Trend 4: Architecture

                           9
RDBMS performance
               Salary List                                        Relational database

                                                                  Requirement of application
 Performance




                                         Majority of
                                         Webapps



                                                       Social network
               We are building




                                                            }
               applications today that
                                                                              Semantic Trading
               have complexity
               requirements that a
               Relational Database
               cannot handle with
               sufficient performance
                                                                        custom



                                                            Data complexity                      10
Scaling to size vs. Scaling to complexity
    Size
       Key/Value stores

                          Bigtable clones

                                            Document databases

                                                                 Graph databases
                                                                             Billions of nodes
                                                                             and relationships




                                > 90% of use cases

                                                                           Complexity

                                                                                   11
Graph Databases focuses on structure of data
                                   Graph databases focus
                                   on the structure of the
                                   data, scaling to the
                                   complexity of the data
                                   and of the application.




                                                 12
What is Neo4j?
๏ Neo4j is a Graph Database
   • Non-relational (“#nosql”), transactional (ACID), embedded
   • Data is stored as a Graph / Network
      ‣Nodes and relationships with properties
      ‣“Property Graph” or “edge-labeled multidigraph”
   • Schema free, bottom-up data model design
๏ Neo4j is Open Source / Free (as in speech) Software
                                                            Prices are available at
                                                            http://neotechnology.com/



   • AGPLv3
                                                            Contact us if you have
                                                            questions and/or special
                                                            license needs (e.g. if you


   • Commercial (“dual license”) license available
                                                            want an evaluation license)




      ‣First server is free (as in beer), next is inexpensive         13
More about Neo4j
๏ Neo4j is stable
   • In 24/7 operation since 2003
๏ Neo4j is in active development
   • Neo Technology received VC funding October 2009
๏ Neo4j delivers high performance graph operations
   • traverses 1’000’000+ relationships / second
       on commodity hardware




                                                       14
The Neo4j Graph data model




•Nodes
•Relationships bet ween Nodes
•Relationships have Labels
•Relationships are directed, but traversed at
equal speed in both directions
•The semantics of the direction is up to the
application (LIVES WITH is reflexive, LOVES is not)
•Nodes have key-value properties
•Relationships have key-value properties              15
The Neo4j Graph data model




•Nodes
•Relationships bet ween Nodes
•Relationships have Labels
•Relationships are directed, but traversed at
equal speed in both directions
•The semantics of the direction is up to the
application (LIVES WITH is reflexive, LOVES is not)
•Nodes have key-value properties
•Relationships have key-value properties              15
The Neo4j Graph data model


                                                      LIVES WITH
                                                               LOVES



                                         OWNS
                                                                       DRIVES

•Nodes
•Relationships bet ween Nodes
•Relationships have Labels
•Relationships are directed, but traversed at
equal speed in both directions
•The semantics of the direction is up to the
application (LIVES WITH is reflexive, LOVES is not)
•Nodes have key-value properties
•Relationships have key-value properties                                        15
The Neo4j Graph data model

                                                                 LOVES

                                                      LIVES WITH
                                                               LOVES



                                         OWNS
                                                                       DRIVES

•Nodes
•Relationships bet ween Nodes
•Relationships have Labels
•Relationships are directed, but traversed at
equal speed in both directions
•The semantics of the direction is up to the
application (LIVES WITH is reflexive, LOVES is not)
•Nodes have key-value properties
•Relationships have key-value properties                                        15
The Neo4j Graph data model
                                                                                name: “Mary”
                                                                 LOVES
             name: “James”                                                      age: 35
             age: 32                                  LIVES WITH
             twitter: “@spam”                                  LOVES



                                         OWNS
                                                                       DRIVES

•Nodes
•Relationships bet ween Nodes
•Relationships have Labels                                     brand: “Volvo”
•Relationships are directed, but traversed at                  model: “V70”
equal speed in both directions
•The semantics of the direction is up to the
application (LIVES WITH is reflexive, LOVES is not)
•Nodes have key-value properties
•Relationships have key-value properties                                                 15
The Neo4j Graph data model
                                                                                name: “Mary”
                                                                 LOVES
             name: “James”                                                      age: 35
             age: 32                                  LIVES WITH
             twitter: “@spam”                                  LOVES



                                         OWNS
                                     item type: “car”                  DRIVES

•Nodes
•Relationships bet ween Nodes
•Relationships have Labels                                     brand: “Volvo”
•Relationships are directed, but traversed at                  model: “V70”
equal speed in both directions
•The semantics of the direction is up to the
application (LIVES WITH is reflexive, LOVES is not)
•Nodes have key-value properties
•Relationships have key-value properties                                                 15
Graphs are all around us
          A                        B           C             D           ...
   1              17                  3.14          3   17.79333333333

   2              42               10.11           14            30.33

   3           316                    6.66          1          2104.56

   4              32                  9.11     592      0.492432432432

   5      Even if this spreadsheet looks
          like it could be a fit for a RDBMS
                                                        2153.175765766
          it isn’t:
          •RDBMSes have problems with
  ...     extending indefinitely on both
          rows and columns
          •Formulas and data
          dependencies would quickly lead
          to heavy join operations

                                                                         16
Graphs are all around us
                 A                B      C         D            ...
   1            17               3.14     3    = A1 * B1 / C1

   2            42               10.11   14    = A2 * B2 / C2

   3           316               6.66     1    = A3 * B3 / C3

   4            32               9.11    592   = A4 * B4 / C4

   5                                           = SUM(D2:D5)
        With data dependencies
  ...   the spread sheet turns
        out to be a graph.




                                                                17
Graphs are all around us
                 A                B      C         D            ...
   1            17               3.14     3    = A1 * B1 / C1

   2            42               10.11   14    = A2 * B2 / C2

   3           316               6.66     1    = A3 * B3 / C3

   4            32               9.11    592   = A4 * B4 / C4

   5                                           = SUM(D2:D5)
        With data dependencies
  ...   the spread sheet turns
        out to be a graph.




                                                                17
Graphs are all around us                      If we add external data
                                              sources the problem
                                              becomes even more
                                              interesting...




          17     3.14       3    = A1 * B1 / C1

          42     10.11     14    = A2 * B2 / C2

          316    6.66       1    = A3 * B3 / C3

          32     9.11      592   = A4 * B4 / C4

                                 = SUM(D2:D5)




                                                      18
Graphs are all around us                      If we add external data
                                              sources the problem
                                              becomes even more
                                              interesting...




          17     3.14       3    = A1 * B1 / C1

          42     10.11     14    = A2 * B2 / C2

          316    6.66       1    = A3 * B3 / C3

          32     9.11      592   = A4 * B4 / C4

                                 = SUM(D2:D5)




                                                      18
Graphs are whiteboard friendly                  An application domain model
                                                outlined on a whiteboard or piece
                                                of paper would be translated to
                                                an ER-diagram, then normalized
                                                to fit a Relational Database.
                                                With a Graph Database the model
                                                from the whiteboard is
                                                implemented directly.




                         Image credits: Tobias Ivarsson            19
Graphs are whiteboard friendly                         An application domain model
                                                       outlined on a whiteboard or piece
                                                       of paper would be translated to
                                                       an ER-diagram, then normalized
                                                       to fit a Relational Database.
                                                       With a Graph Database the model
                                                       from the whiteboard is
                                                       implemented directly.

                            *
                    1
                                          *
            *           1




            *                                 1
                        *

                   1
                            *


                                Image credits: Tobias Ivarsson            19
Graphs are whiteboard friendly                         An application domain model
                                                       outlined on a whiteboard or piece
                                                       of paper would be translated to
                                                       an ER-diagram, then normalized
                                                       to fit a Relational Database.
                                                       With a Graph Database the model
                                                       from the whiteboard is
                                                       implemented directly.
                        thobe



                                       Joe project blog


                                     Wardrobe Strength


                 Hello Joe

                 Modularizing Jython

                    Neo4j performance analysis
                                Image credits: Tobias Ivarsson            19
Query Languages
๏ Traversal APIs
   • Neo4j core traversers
   • Blueprint pipes
๏ SPARQL - “SQL for linked data” - query by graph pattern matching
   SELECT ?person WHERE {                                                      Find all persons that
       ?person neo4j:KNOWS ?friend .                                           KNOWS a friend that
       ?friend neo4j:KNOWS ?foe .                                              KNOWS someone named
                                                                               “Larry Ellison”.
       ?foe neo4j:name "Larry Ellison" .
   }

๏ Gremlin - “perl for graphs” - query by traversal
   ./outE[@label='KNOWS']/inV[@age > 30]/@name

          Give me the names of all the people I know that are older than 30.                           20
Data manipulation API
GraphDatabaseService graphDb = getGraphDbInstanceSomehow();


   // Create Thomas 'Neo' Anderson
   Node mrAnderson = graphDb.createNode();
   mrAnderson.setProperty( "name", "Thomas Anderson" );
   mrAnderson.setProperty( "age", 29 );

   // Create Morpheus
   Node morpheus = graphDb.createNode();
   morpheus.setProperty( "name", "Morpheus" );
   morpheus.setProperty( "rank", "Captain" );
   morpheus.setProperty( "occupation", "Total bad ass" );

   // Create relationship representing they know each other
   mrAnderson.createRelationshipTo( morpheus, RelTypes.KNOWS );
   // ... similarly for Trinity, Cypher, Agent Smith, Architect


                                                          21
Data manipulation API
GraphDatabaseService graphDb = getGraphDbInstanceSomehow();
Transaction tx = graphDb.beginTx();
try {
   // Create Thomas 'Neo' Anderson
   Node mrAnderson = graphDb.createNode();
   mrAnderson.setProperty( "name", "Thomas Anderson" );
   mrAnderson.setProperty( "age", 29 );

   // Create Morpheus
   Node morpheus = graphDb.createNode();
   morpheus.setProperty( "name", "Morpheus" );
   morpheus.setProperty( "rank", "Captain" );
   morpheus.setProperty( "occupation", "Total bad ass" );

   // Create relationship representing they know each other
   mrAnderson.createRelationshipTo( morpheus, RelTypes.KNOWS );
   // ... similarly for Trinity, Cypher, Agent Smith, Architect
    tx.success();
} finally {
   tx.finish();                                          21
}
Graph traversals


                                                                                  name: “The Architect”
                                    disclosure: “public”
name: “Thomas Anderson”
age: 29                                                     name: “Cypher”
                                                            last name: “Reagan”
                   KNOWS name: “Morpheus”
             KNOWS                                  KNOWS
                         rank: “Captain”                                                CODED BY
       LOVES             occupation: “Total badass”                        KNOWS
                           KNOWS
         name: “Trinity”                            disclosure: “secret”
                                                                              name: “Agent Smith”
                                                                              version: “1.0b”
 since: “meeting the oracle”       since: “a year before the movie”
                                                                              language: “C++”
                                   cooperates on: “The Nebuchadnezzar”




                                                                                           22
Graph traversals                                                                  name: “The Architect”
                                    disclosure: “public”
name: “Thomas Anderson”
age: 29                                                     name: “Cypher”
                                                            last name: “Reagan”
                   KNOWS name: “Morpheus”
             KNOWS                                  KNOWS
                         rank: “Captain”                                                CODED BY
       LOVES             occupation: “Total badass”                        KNOWS
                           KNOWS
         name: “Trinity”                            disclosure: “secret”
                                                                              name: “Agent Smith”
                                                                              version: “1.0b”
 since: “meeting the oracle”       since: “a year before the movie”
                                                                              language: “C++”
                                   cooperates on: “The Nebuchadnezzar”
import neo4j
class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j
   types = [ neo4j.Outgoing.KNOWS ]
   order = neo4j.BREADTH_FIRST
   stop = neo4j.STOP_AT_END_OF_GRAPH
   returnable = neo4j.RETURN_ALL_BUT_START_NODE
for friend_node in Friends(mr_anderson):
   print "%s (@ depth=%s)" % ( friend_node["name"],
     friend_node.depth )
                                                                                           23
Graph traversals                                                                  name: “The Architect”
                                    disclosure: “public”
name: “Thomas Anderson”
age: 29                                                     name: “Cypher”
                                                            last name: “Reagan”
                   KNOWS name: “Morpheus”
             KNOWS                                  KNOWS
                         rank: “Captain”                                                CODED BY
       LOVES             occupation: “Total badass”                        KNOWS
                           KNOWS
         name: “Trinity”                            disclosure: “secret”
                                                                              name: “Agent Smith”
                                                                              version: “1.0b”
 since: “meeting the oracle”       since: “a year before the movie”
                                                                              language: “C++”
                                   cooperates on: “The Nebuchadnezzar”
import neo4j
class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j
   types = [ neo4j.Outgoing.KNOWS ]
   order = neo4j.BREADTH_FIRST
   stop = neo4j.STOP_AT_END_OF_GRAPH
   returnable = neo4j.RETURN_ALL_BUT_START_NODE
for friend_node in Friends(mr_anderson):
   print "%s (@ depth=%s)" % ( friend_node["name"],
     friend_node.depth )
                                                                                           23
Graph traversals                                                                  name: “The Architect”
                                    disclosure: “public”
name: “Thomas Anderson”
age: 29                                                     name: “Cypher”
                                                            last name: “Reagan”
                   KNOWS name: “Morpheus”
             KNOWS                                  KNOWS
                         rank: “Captain”                                                CODED BY
       LOVES             occupation: “Total badass”                        KNOWS
                           KNOWS
         name: “Trinity”                            disclosure: “secret”
                                                                              name: “Agent Smith”
                                                                              version: “1.0b”
 since: “meeting the oracle”       since: “a year before the movie”
                                                                              language: “C++”
                                   cooperates on: “The Nebuchadnezzar”
import neo4j
class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j
   types = [ neo4j.Outgoing.KNOWS ]               Morpheus (@ depth=1)
   order = neo4j.BREADTH_FIRST
   stop = neo4j.STOP_AT_END_OF_GRAPH
   returnable = neo4j.RETURN_ALL_BUT_START_NODE
for friend_node in Friends(mr_anderson):
   print "%s (@ depth=%s)" % ( friend_node["name"],
     friend_node.depth )
                                                                                           23
Graph traversals                                                                  name: “The Architect”
                                    disclosure: “public”
name: “Thomas Anderson”
age: 29                                                     name: “Cypher”
                                                            last name: “Reagan”
                   KNOWS name: “Morpheus”
             KNOWS                                  KNOWS
                         rank: “Captain”                                                CODED BY
       LOVES             occupation: “Total badass”                        KNOWS
                           KNOWS
         name: “Trinity”                            disclosure: “secret”
                                                                              name: “Agent Smith”
                                                                              version: “1.0b”
 since: “meeting the oracle”       since: “a year before the movie”
                                                                              language: “C++”
                                   cooperates on: “The Nebuchadnezzar”
import neo4j
class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j
   types = [ neo4j.Outgoing.KNOWS ]               Morpheus (@ depth=1)
   order = neo4j.BREADTH_FIRST                    Trinity (@ depth=1)
   stop = neo4j.STOP_AT_END_OF_GRAPH
   returnable = neo4j.RETURN_ALL_BUT_START_NODE
for friend_node in Friends(mr_anderson):
   print "%s (@ depth=%s)" % ( friend_node["name"],
     friend_node.depth )
                                                                                           23
Graph traversals                                                                  name: “The Architect”
                                    disclosure: “public”
name: “Thomas Anderson”
age: 29                                                     name: “Cypher”
                                                            last name: “Reagan”
                   KNOWS name: “Morpheus”
             KNOWS                                  KNOWS
                         rank: “Captain”                                                CODED BY
       LOVES             occupation: “Total badass”                        KNOWS
                           KNOWS
         name: “Trinity”                            disclosure: “secret”
                                                                              name: “Agent Smith”
                                                                              version: “1.0b”
 since: “meeting the oracle”       since: “a year before the movie”
                                                                              language: “C++”
                                   cooperates on: “The Nebuchadnezzar”
import neo4j
class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j
   types = [ neo4j.Outgoing.KNOWS ]               Morpheus (@ depth=1)
   order = neo4j.BREADTH_FIRST                    Trinity (@ depth=1)
   stop = neo4j.STOP_AT_END_OF_GRAPH
                                                  Cypher (@ depth=2)
   returnable = neo4j.RETURN_ALL_BUT_START_NODE
for friend_node in Friends(mr_anderson):
   print "%s (@ depth=%s)" % ( friend_node["name"],
     friend_node.depth )
                                                                                           23
Graph traversals                                                                  name: “The Architect”
                                    disclosure: “public”
name: “Thomas Anderson”
age: 29                                                     name: “Cypher”
                                                            last name: “Reagan”
                   KNOWS name: “Morpheus”
             KNOWS                                  KNOWS
                         rank: “Captain”                                                CODED BY
       LOVES             occupation: “Total badass”                        KNOWS
                           KNOWS
         name: “Trinity”                            disclosure: “secret”
                                                                              name: “Agent Smith”
                                                                              version: “1.0b”
 since: “meeting the oracle”       since: “a year before the movie”
                                                                              language: “C++”
                                   cooperates on: “The Nebuchadnezzar”
import neo4j
class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j
   types = [ neo4j.Outgoing.KNOWS ]               Morpheus (@ depth=1)
   order = neo4j.BREADTH_FIRST                    Trinity (@ depth=1)
   stop = neo4j.STOP_AT_END_OF_GRAPH
                                                  Cypher (@ depth=2)
   returnable = neo4j.RETURN_ALL_BUT_START_NODE
                                                                             Agent Smith (@ depth=3)
for friend_node in Friends(mr_anderson):
   print "%s (@ depth=%s)" % ( friend_node["name"],
     friend_node.depth )
                                                                                           23
Graph traversals                                                                  name: “The Architect”
                                    disclosure: “public”
name: “Thomas Anderson”
age: 29                                                     name: “Cypher”
                                                            last name: “Reagan”
                   KNOWS name: “Morpheus”
             KNOWS                                  KNOWS
                         rank: “Captain”                                                CODED BY
       LOVES             occupation: “Total badass”                        KNOWS
                           KNOWS
         name: “Trinity”                            disclosure: “secret”
                                                                              name: “Agent Smith”
                                                                              version: “1.0b”
 since: “meeting the oracle”       since: “a year before the movie”
                                                                              language: “C++”
                                   cooperates on: “The Nebuchadnezzar”
import neo4j
class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j
   types = [ neo4j.Outgoing.KNOWS ]               Morpheus (@ depth=1)
   order = neo4j.BREADTH_FIRST                    Trinity (@ depth=1)
   stop = neo4j.STOP_AT_END_OF_GRAPH
                                                  Cypher (@ depth=2)
   returnable = neo4j.RETURN_ALL_BUT_START_NODE
                                                                             Agent Smith (@ depth=3)
for friend_node in Friends(mr_anderson):
   print "%s (@ depth=%s)" % ( friend_node["name"],
     friend_node.depth )
                                                                                           23
Finding a place to start
๏ Traversals need a Node to start from
    • QUESTION: How do I find the start Node?
    • ANSWER:You use an Index
๏ Indexes in Neo4j are different from Indexes in Relational Databases
    • RDBMSes use them for Joining
    • Neo4j use them for simple lookup
IndexService index = getGraphDbIndexServiceSomehow();

Node mrAnderson = index.getSingleNode( "name",
                                        "Thomas Anderson" );

performTraversalFrom( mrAnderson );
                                                              24
Indexes in Neo4j
๏ The Graph *is* the main index
   • Use relationship labels for navigation
   • Build index structures *in the graph*
     ‣Search trees, tag clouds, geospatial indexes, et.c.
     ‣Linked/skip lists or other data structures in the graph
     ‣We have utility libraries for this
๏ External indexes used *for lookup*
   • Finding a (number of) points to start traversals from
   • Major difference from RDBMS that use indexes for everything
                                                                25
A domain object implemented in Neo4j
public interface Person {
   String getName();
   void setName( String firstName, String lastName );
}

public final class PersonImpl implements Person {
   private final Node underlyingNode;
   public PersonImpl( Node underlyingNode ) {
       this.underlyingNode = underlyingNode;
   }
   public String getName() {
       return String.format("%s %s",
          underlyingNode.getProperty("first name"),
          underlyingNode.getProperty("last name") );
   }
   public String setName(String firstName, String lastName) {
       underlyingNode.setProperty("first name", firstName);
       underlyingNode.setProperty("last name", lastName);
   }
}                                                         26
Neo4j as Software Transactional Memory
๏ Implement objects as wrappers around Nodes and Relationships
   • Neo4j is fast enough to allow you to read all state from the
      Node/Relationship
๏ Mutating operations require transactions
   • The changes are isolated from all other threads until committed
   • Multiple mutations can be committed atomically
๏ Nested transactions are flattened
   • Makes it possible to have methods open their own transaction
๏ Fits nicely with the OO paradigm
   • More focus on data than on objects (comp. Object DBs)    27
Why not use an O/R mapper?
๏ Model evolution in ORMs is a hard problem
   • virtually unsupported in most ORM systems
๏ SQL is “compatible” across many RDBMSs
   • data is still locked in
๏ Each ORM maps object models differently
   • Moving to another ORM == legacy schema support
      ‣except your legacy schema is a strange auto-generated one
๏ Object/Graph Mapping is always done the same way
   • allows you to keep your data through application changes
   • or share data between multiple implementations         28
What an ORM doesn’t do

๏Deep traversals
๏Graph algorithms
๏Shortest path(s)
๏Routing
๏etc.
                          29
Path exists in social network
๏ Each person has on average 50 friends      The performance impact
                                             in Neo4j depends only on
                                             the degree of each node. in
             Tobias                          an RDBMS it depends on
                                             the number of entries in
                                             the tables involved in the
                                             join(s).
                                   Emil



                 Johan
                                                Peter


        Database               # persons query time
  Relational database                 1 000      2 000 ms
  Neo4j Graph Database                1 000          2 ms
  Neo4j Graph Database            1 000 000          2 ms
  Relational database             1 000 000 way too long...
                                                                    30
Path exists in social network
๏ Each person has on average 50 friends      The performance impact
                                             in Neo4j depends only on
                                             the degree of each node. in
             Tobias                          an RDBMS it depends on
                                             the number of entries in
                                             the tables involved in the
                                             join(s).
                                   Emil



                 Johan
                                                Peter


        Database               # persons query time
  Relational database                 1 000      2 000 ms
  Neo4j Graph Database                1 000          2 ms
  Neo4j Graph Database            1 000 000          2 ms
  Relational database             1 000 000 way too long...
                                                                    30
Path exists in social network
๏ Each person has on average 50 friends      The performance impact
                                             in Neo4j depends only on
                                             the degree of each node. in
             Tobias                          an RDBMS it depends on
                                             the number of entries in
                                             the tables involved in the
                                             join(s).
                                   Emil



                 Johan
                                                Peter


        Database               # persons query time
  Relational database                 1 000      2 000 ms
  Neo4j Graph Database                1 000          2 ms
  Neo4j Graph Database            1 000 000          2 ms
  Relational database             1 000 000 way too long...
                                                                    30
Path exists in social network
๏ Each person has on average 50 friends      The performance impact
                                             in Neo4j depends only on
                                             the degree of each node. in
             Tobias                          an RDBMS it depends on
                                             the number of entries in
                                             the tables involved in the
                                             join(s).
                                   Emil



                 Johan
                                                Peter


        Database               # persons query time
  Relational database                 1 000      2 000 ms
  Neo4j Graph Database                1 000          2 ms
  Neo4j Graph Database            1 000 000          2 ms
  Relational database             1 000 000 way too long...
                                                                    30
Path exists in social network
๏ Each person has on average 50 friends      The performance impact
                                             in Neo4j depends only on
                                             the degree of each node. in
             Tobias                          an RDBMS it depends on
                                             the number of entries in
                                             the tables involved in the
                                             join(s).
                                   Emil



                 Johan
                                                Peter


        Database               # persons query time
  Relational database                 1 000      2 000 ms
  Neo4j Graph Database                1 000          2 ms
  Neo4j Graph Database            1 000 000          2 ms
  Relational database             1 000 000 way too long...
                                                                    30
On-line real time routing with Neo4j
๏ 20 million Nodes - represents places
๏ 62 million Edges - represents direct roads between places
   • These edges have a length property, for the length of the road
๏ Average optimal route, 100 separate roads, found in 100ms
๏ Worst case route we could find:
   • Optimal route is 5500 separate roads
   • Total length ~770km                             There’s a difference


   • Found in less than 3 seconds
                                                     bet ween least
                                                     number of hops and
                                                     least cost.

๏ Uses A* “best first” search
                                                                    31
Routing with Neo4j - using Neo4j Graph-Algos
# The cost evaluator - for choosing the best next node
class GeoCostEvaluator
    include EstimateEvaluator
    def getCost(node, goal)
        straight_path_distance(
           node.getProperty("lat"), node.getProperty("lon"),
           goal.getProperty("lat"), goal.getProperty("lon") )
    end
end

# Instantiate the A* search function
path_finder = AStar.new( Neo4j::instance,
   RelationshipExpander.forTypes(
       DynamicRelationshipType.withName("road"),
          Direction::BOTH ),
   DoubleEvaluator.new("length"), GeoCostEvaluator.new )

# Find the best path between New York City and San Francisco
best_path = path_finder.findSinglePath( NYC, SF )
                                                           32
Newest addition: Neo4j lets you REST
๏ Hello Neo4j REST server - Neo4j no longer needs to be embedded
๏ Opens up Neo4j to your favorite platform (even if that isn’t Java)
   • PHP, .NET, et.c. - libraries already exists!
   • http://wiki.neo4j.org/content/Getting_Started_REST
๏ Uses JSON for state transfer + browsable HTML for introspection
๏ Atomic modification operations
๏ Brand new declarative traversal framework
   • Extensible using your favorite scripting language
      ‣javascript is included. Jython, JRuby, et.c. supported
                                                                33
Other cool Graph Databases
๏ Sones GraphDB
   • Graph Query Language - a SQL-like query language for graphs
๏ Franz Inc. AllegroGraph
๏ HypergraphDB
๏ InfoGrid
๏ Twitter’s FlockDB
   • Optimized for the Twitter use case - one level relationships
๏ Interestingly we all have different approaches
                                                               34
Up until recently there was
                                                   only one Database, the
                                                   RDBMS.
                                                   The days of a single database
                                                   that rules all is over.




One database to rule them all


            Image credits: The Lord of the Rings, New Line Cinema

                                                                        35
Use best suited storage for each kind of data
                                                      The era of using
                                                      RDBMSes for all
                                                      problems is over.
                                                      Instead we should use
                                                      the database most
                                                      suited for the problem
                                                      at hand.




                             Image credits: Unknown :’(        36
Polyglot persistence
                                    ... we could even use
                                    multiple databases in
                                    conjunction, and let
                                    each database handle
                                    the things it does best.




                       Document
                            {...}


                            {...}


                            {...}
                                             37
Polyglot persistence
                 SQL && NOSQL


                                            Document
                                                 {...}


                                                 {...}

      All databases are welcome!
      SQL and NOSQL - it is Not Only SQL!        {...}
                                                         38
Finding out more
๏ http://neo4j.org/ - project website
      ‣http://api.neo4j.org/ and http://components.neo4j.org/
      ‣http://wiki.neo4j.org/ - HowTos, Tutorials, Examples, FAQ, et.c.
      ‣http://planet.neo4j.org/ - aggregation of blogs about Neo4j
๏ http://neotechnology.com/ - commercial licensing
๏ http://twitter.com/neo4j/team - follow the Neo4j team
๏ http://nosql.mypopescu.com/ - good source for news on NOSQL
     monitors Neo4j and other NOSQL solutions
๏ http://highscalability.com/ - has published a few articles about Neo4j
                                                                39
Buzzword summary                                                      http://neo4j.org/


                                                   Semi structured
                        SPARQL
      AGPLv3
                                                                 ACID transactions
                                         Open Source

               Object mapping                          Gremlin        Shortest path
In-Graph indexes                           NOSQL
             A* routing
                                                       whiteboard friendly
                               RESTful
       Traversal
                                         Query language

                 Embedded
                                                           Beer
                                                                       Schema free
                                   Software Transactional Memory
Right tool for the right job
           Scaling to complexity
                                                   Free Software

                         Polyglot persistence
                                                                             40
http://neotechnology.com

More Related Content

What's hot (20)

PDF
The Graph Database Universe: Neo4j Overview
Neo4j
 
PDF
Introduction à Neo4j
Neo4j
 
PPTX
YugaByte DB Internals - Storage Engine and Transactions
Yugabyte
 
PDF
Introduction to Neo4j for the Emirates & Bahrain
Neo4j
 
PDF
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Neo4j
 
PPTX
The Impala Cookbook
Cloudera, Inc.
 
PPTX
Databricks for Dummies
Rodney Joyce
 
PDF
Modernizing to a Cloud Data Architecture
Databricks
 
PPTX
Base de données graphe et Neo4j
Boris Guarisma
 
PPTX
The art of the possible with graph technology_Neo4j GraphSummit Dublin 2023.pptx
Neo4j
 
PDF
Databricks Delta Lake and Its Benefits
Databricks
 
PDF
Neo4j in Depth
Max De Marzi
 
KEY
Intro to Neo4j presentation
jexp
 
PDF
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Databricks
 
PDF
Intro to Cypher
Neo4j
 
PDF
Airbyte @ Airflow Summit - The new modern data stack
Michel Tricot
 
PDF
Data Modeling with Neo4j
Neo4j
 
PDF
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
PDF
Graph database Use Cases
Max De Marzi
 
PPTX
Databricks Platform.pptx
Alex Ivy
 
The Graph Database Universe: Neo4j Overview
Neo4j
 
Introduction à Neo4j
Neo4j
 
YugaByte DB Internals - Storage Engine and Transactions
Yugabyte
 
Introduction to Neo4j for the Emirates & Bahrain
Neo4j
 
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Neo4j
 
The Impala Cookbook
Cloudera, Inc.
 
Databricks for Dummies
Rodney Joyce
 
Modernizing to a Cloud Data Architecture
Databricks
 
Base de données graphe et Neo4j
Boris Guarisma
 
The art of the possible with graph technology_Neo4j GraphSummit Dublin 2023.pptx
Neo4j
 
Databricks Delta Lake and Its Benefits
Databricks
 
Neo4j in Depth
Max De Marzi
 
Intro to Neo4j presentation
jexp
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Databricks
 
Intro to Cypher
Neo4j
 
Airbyte @ Airflow Summit - The new modern data stack
Michel Tricot
 
Data Modeling with Neo4j
Neo4j
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
Graph database Use Cases
Max De Marzi
 
Databricks Platform.pptx
Alex Ivy
 

Viewers also liked (20)

PDF
Graph database super star
andres_taylor
 
PPTX
An Introduction to NOSQL, Graph Databases and Neo4j
Debanjan Mahata
 
PDF
GraphTalks Rome - Introducing Neo4j
Neo4j
 
PPTX
Introduction to Graph Databases
Max De Marzi
 
PDF
Working With a Real-World Dataset in Neo4j: Import and Modeling
Neo4j
 
PPTX
OrientDB vs Neo4j - Comparison of query/speed/functionality
Curtis Mosters
 
PPTX
Neo4j - graph database for recommendations
proksik
 
PDF
Graph Databases: Trends in the Web of Data
Marko Rodriguez
 
PDF
Relational to Big Graph
Neo4j
 
PPT
Natural Language Processing with Neo4j
Kenny Bastani
 
PPT
Big Graph Analytics on Neo4j with Apache Spark
Kenny Bastani
 
PDF
Neo4j PartnerDay Amsterdam 2017
Neo4j
 
PDF
Intro To MongoDB
Alex Sharp
 
PDF
Neo4j - 5 cool graph examples
Peter Neubauer
 
PPTX
Graph Databases
thai
 
PDF
Use Neo4j In Your Next Java Project
Tobias Coetzee
 
PDF
The Panama Papers: analysing it with neo4j and neo4j spatial - MINC 2016
Craig Taverner
 
PDF
The Definition of GraphDB
Takahiro Inoue
 
PPTX
Vbug nov 2010 Visio Validation
David Parker
 
PDF
Graph databases in PHP @ PHPCon Poland 10-22-2011
Alessandro Nadalin
 
Graph database super star
andres_taylor
 
An Introduction to NOSQL, Graph Databases and Neo4j
Debanjan Mahata
 
GraphTalks Rome - Introducing Neo4j
Neo4j
 
Introduction to Graph Databases
Max De Marzi
 
Working With a Real-World Dataset in Neo4j: Import and Modeling
Neo4j
 
OrientDB vs Neo4j - Comparison of query/speed/functionality
Curtis Mosters
 
Neo4j - graph database for recommendations
proksik
 
Graph Databases: Trends in the Web of Data
Marko Rodriguez
 
Relational to Big Graph
Neo4j
 
Natural Language Processing with Neo4j
Kenny Bastani
 
Big Graph Analytics on Neo4j with Apache Spark
Kenny Bastani
 
Neo4j PartnerDay Amsterdam 2017
Neo4j
 
Intro To MongoDB
Alex Sharp
 
Neo4j - 5 cool graph examples
Peter Neubauer
 
Graph Databases
thai
 
Use Neo4j In Your Next Java Project
Tobias Coetzee
 
The Panama Papers: analysing it with neo4j and neo4j spatial - MINC 2016
Craig Taverner
 
The Definition of GraphDB
Takahiro Inoue
 
Vbug nov 2010 Visio Validation
David Parker
 
Graph databases in PHP @ PHPCon Poland 10-22-2011
Alessandro Nadalin
 
Ad

Similar to NOSQLEU - Graph Databases and Neo4j (20)

PDF
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)
Emil Eifrem
 
PDF
Django and Neo4j - Domain modeling that kicks ass
Tobias Lindaaker
 
PPTX
No Sql Movement
Ajit Koti
 
PDF
NOSQL overview and intro to graph databases with Neo4j (Geeknight May 2010)
Emil Eifrem
 
PDF
NOSQL Overview, Neo4j Intro And Production Example (QCon London 2010)
Emil Eifrem
 
PDF
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Cloudera, Inc.
 
PPTX
An Introduction to Big Data, NoSQL and MongoDB
William LaForest
 
PDF
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
Emil Eifrem
 
PPT
Big Data = Big Decisions
InnoTech
 
PPT
NoSQL Basics - a quick tour
Bikram Sinha. MBA, PMP
 
KEY
Spring Data Neo4j Intro SpringOne 2011
jexp
 
PPTX
Anti-social Databases
William LaForest
 
PPT
Mongodb open source_high_performance_database
Murat Çakal
 
PDF
Web 3.0: The Upcoming Revolution
Nitin Godawat
 
PDF
The Perfect Storm: The Impact of Analytics, Big Data and Analytics
Inside Analysis
 
PDF
Spring Into the Cloud
Jennifer Hickey
 
PDF
Oracle unified directory_11g
OracleIDM
 
PDF
CloudFest Denver When Worlds Collide: HTML5 Meets the Cloud
David Pallmann
 
PDF
An overview of NOSQL (JFokus 2011)
Emil Eifrem
 
PPTX
MongoDB
Stefano Coratti
 
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)
Emil Eifrem
 
Django and Neo4j - Domain modeling that kicks ass
Tobias Lindaaker
 
No Sql Movement
Ajit Koti
 
NOSQL overview and intro to graph databases with Neo4j (Geeknight May 2010)
Emil Eifrem
 
NOSQL Overview, Neo4j Intro And Production Example (QCon London 2010)
Emil Eifrem
 
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Cloudera, Inc.
 
An Introduction to Big Data, NoSQL and MongoDB
William LaForest
 
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
Emil Eifrem
 
Big Data = Big Decisions
InnoTech
 
NoSQL Basics - a quick tour
Bikram Sinha. MBA, PMP
 
Spring Data Neo4j Intro SpringOne 2011
jexp
 
Anti-social Databases
William LaForest
 
Mongodb open source_high_performance_database
Murat Çakal
 
Web 3.0: The Upcoming Revolution
Nitin Godawat
 
The Perfect Storm: The Impact of Analytics, Big Data and Analytics
Inside Analysis
 
Spring Into the Cloud
Jennifer Hickey
 
Oracle unified directory_11g
OracleIDM
 
CloudFest Denver When Worlds Collide: HTML5 Meets the Cloud
David Pallmann
 
An overview of NOSQL (JFokus 2011)
Emil Eifrem
 
Ad

More from Tobias Lindaaker (9)

PDF
NOSQL Overview
Tobias Lindaaker
 
PDF
Building Applications with a Graph Database
Tobias Lindaaker
 
PDF
JDK Power Tools
Tobias Lindaaker
 
PDF
Choosing the right NOSQL database
Tobias Lindaaker
 
PDF
[JavaOne 2011] Models for Concurrent Programming
Tobias Lindaaker
 
PDF
Persistent graphs in Python with Neo4j
Tobias Lindaaker
 
PDF
A Better Python for the JVM
Tobias Lindaaker
 
PDF
A Better Python for the JVM
Tobias Lindaaker
 
PDF
Exploiting Concurrency with Dynamic Languages
Tobias Lindaaker
 
NOSQL Overview
Tobias Lindaaker
 
Building Applications with a Graph Database
Tobias Lindaaker
 
JDK Power Tools
Tobias Lindaaker
 
Choosing the right NOSQL database
Tobias Lindaaker
 
[JavaOne 2011] Models for Concurrent Programming
Tobias Lindaaker
 
Persistent graphs in Python with Neo4j
Tobias Lindaaker
 
A Better Python for the JVM
Tobias Lindaaker
 
A Better Python for the JVM
Tobias Lindaaker
 
Exploiting Concurrency with Dynamic Languages
Tobias Lindaaker
 

Recently uploaded (20)

PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
Python basic programing language for automation
DanialHabibi2
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
Python basic programing language for automation
DanialHabibi2
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 

NOSQLEU - Graph Databases and Neo4j

  • 1. Graph Databases and Neo4j twitter: @thobe / #neo4j Tobias Ivarsson email: [email protected] web: http://www.neo4j.org/ Hacker @ Neo Technology web: http://www.thobe.org/
  • 2. NOSQL - Why now? Four trends 2
  • 3. Trend 1: Data size ExaBytes (10¹⁸) of data stored per year 988 1000 Each year more and more digital data is created. Over t wo 750 years we create more digital data than all 623 the data created in history before that. 500 397 253 250 161 0 2006 2007 2008 2009 2010 Data source: IDC 2007 3
  • 4. Trend 2: Connectedness Giant Global Graph (GGG) Over time data has evolved to Ontologies be more and more interlinked and connected. RDF Hypertext has links, Blogs have pingback, Tagging groups all related data Folksonomies Information connectivity Tagging Wikis User-generated content Blogs RSS Hypertext Text documents web 1.0 web 2.0 “web 3.0” 1990 2000 2010 2020 4
  • 5. Trend 3: Semi-structure ๏ Individualization of content • In the salary lists of the 1970s, all elements had exactly one job • In Or 15? lists of the 2000s, we need 5 job columns! Or 8? the salary ๏ All encompassing “entire world views” • Store more data about each entity ๏ Trend accelerated by the decentralization of content generation that is the hallmark of the age of participation (“web 2.0”) 5
  • 6. Trend 4: Architecture 1980s: Mainframe applications Application DB 6
  • 7. Trend 4: Architecture 1990s: Database as integration hub Application Application Application DB 7
  • 8. Trend 4: Architecture 2000s: (moving towards) Decoupled services with their own backend Application Application Application DB DB DB 8
  • 9. Why NOSQL Now? ๏Trend 1: Size ๏Trend 2: Connectedness ๏Trend 3: Semi-structure ๏Trend 4: Architecture 9
  • 10. RDBMS performance Salary List Relational database Requirement of application Performance Majority of Webapps Social network We are building } applications today that Semantic Trading have complexity requirements that a Relational Database cannot handle with sufficient performance custom Data complexity 10
  • 11. Scaling to size vs. Scaling to complexity Size Key/Value stores Bigtable clones Document databases Graph databases Billions of nodes and relationships > 90% of use cases Complexity 11
  • 12. Graph Databases focuses on structure of data Graph databases focus on the structure of the data, scaling to the complexity of the data and of the application. 12
  • 13. What is Neo4j? ๏ Neo4j is a Graph Database • Non-relational (“#nosql”), transactional (ACID), embedded • Data is stored as a Graph / Network ‣Nodes and relationships with properties ‣“Property Graph” or “edge-labeled multidigraph” • Schema free, bottom-up data model design ๏ Neo4j is Open Source / Free (as in speech) Software Prices are available at http://neotechnology.com/ • AGPLv3 Contact us if you have questions and/or special license needs (e.g. if you • Commercial (“dual license”) license available want an evaluation license) ‣First server is free (as in beer), next is inexpensive 13
  • 14. More about Neo4j ๏ Neo4j is stable • In 24/7 operation since 2003 ๏ Neo4j is in active development • Neo Technology received VC funding October 2009 ๏ Neo4j delivers high performance graph operations • traverses 1’000’000+ relationships / second on commodity hardware 14
  • 15. The Neo4j Graph data model •Nodes •Relationships bet ween Nodes •Relationships have Labels •Relationships are directed, but traversed at equal speed in both directions •The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not) •Nodes have key-value properties •Relationships have key-value properties 15
  • 16. The Neo4j Graph data model •Nodes •Relationships bet ween Nodes •Relationships have Labels •Relationships are directed, but traversed at equal speed in both directions •The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not) •Nodes have key-value properties •Relationships have key-value properties 15
  • 17. The Neo4j Graph data model LIVES WITH LOVES OWNS DRIVES •Nodes •Relationships bet ween Nodes •Relationships have Labels •Relationships are directed, but traversed at equal speed in both directions •The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not) •Nodes have key-value properties •Relationships have key-value properties 15
  • 18. The Neo4j Graph data model LOVES LIVES WITH LOVES OWNS DRIVES •Nodes •Relationships bet ween Nodes •Relationships have Labels •Relationships are directed, but traversed at equal speed in both directions •The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not) •Nodes have key-value properties •Relationships have key-value properties 15
  • 19. The Neo4j Graph data model name: “Mary” LOVES name: “James” age: 35 age: 32 LIVES WITH twitter: “@spam” LOVES OWNS DRIVES •Nodes •Relationships bet ween Nodes •Relationships have Labels brand: “Volvo” •Relationships are directed, but traversed at model: “V70” equal speed in both directions •The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not) •Nodes have key-value properties •Relationships have key-value properties 15
  • 20. The Neo4j Graph data model name: “Mary” LOVES name: “James” age: 35 age: 32 LIVES WITH twitter: “@spam” LOVES OWNS item type: “car” DRIVES •Nodes •Relationships bet ween Nodes •Relationships have Labels brand: “Volvo” •Relationships are directed, but traversed at model: “V70” equal speed in both directions •The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not) •Nodes have key-value properties •Relationships have key-value properties 15
  • 21. Graphs are all around us A B C D ... 1 17 3.14 3 17.79333333333 2 42 10.11 14 30.33 3 316 6.66 1 2104.56 4 32 9.11 592 0.492432432432 5 Even if this spreadsheet looks like it could be a fit for a RDBMS 2153.175765766 it isn’t: •RDBMSes have problems with ... extending indefinitely on both rows and columns •Formulas and data dependencies would quickly lead to heavy join operations 16
  • 22. Graphs are all around us A B C D ... 1 17 3.14 3 = A1 * B1 / C1 2 42 10.11 14 = A2 * B2 / C2 3 316 6.66 1 = A3 * B3 / C3 4 32 9.11 592 = A4 * B4 / C4 5 = SUM(D2:D5) With data dependencies ... the spread sheet turns out to be a graph. 17
  • 23. Graphs are all around us A B C D ... 1 17 3.14 3 = A1 * B1 / C1 2 42 10.11 14 = A2 * B2 / C2 3 316 6.66 1 = A3 * B3 / C3 4 32 9.11 592 = A4 * B4 / C4 5 = SUM(D2:D5) With data dependencies ... the spread sheet turns out to be a graph. 17
  • 24. Graphs are all around us If we add external data sources the problem becomes even more interesting... 17 3.14 3 = A1 * B1 / C1 42 10.11 14 = A2 * B2 / C2 316 6.66 1 = A3 * B3 / C3 32 9.11 592 = A4 * B4 / C4 = SUM(D2:D5) 18
  • 25. Graphs are all around us If we add external data sources the problem becomes even more interesting... 17 3.14 3 = A1 * B1 / C1 42 10.11 14 = A2 * B2 / C2 316 6.66 1 = A3 * B3 / C3 32 9.11 592 = A4 * B4 / C4 = SUM(D2:D5) 18
  • 26. Graphs are whiteboard friendly An application domain model outlined on a whiteboard or piece of paper would be translated to an ER-diagram, then normalized to fit a Relational Database. With a Graph Database the model from the whiteboard is implemented directly. Image credits: Tobias Ivarsson 19
  • 27. Graphs are whiteboard friendly An application domain model outlined on a whiteboard or piece of paper would be translated to an ER-diagram, then normalized to fit a Relational Database. With a Graph Database the model from the whiteboard is implemented directly. * 1 * * 1 * 1 * 1 * Image credits: Tobias Ivarsson 19
  • 28. Graphs are whiteboard friendly An application domain model outlined on a whiteboard or piece of paper would be translated to an ER-diagram, then normalized to fit a Relational Database. With a Graph Database the model from the whiteboard is implemented directly. thobe Joe project blog Wardrobe Strength Hello Joe Modularizing Jython Neo4j performance analysis Image credits: Tobias Ivarsson 19
  • 29. Query Languages ๏ Traversal APIs • Neo4j core traversers • Blueprint pipes ๏ SPARQL - “SQL for linked data” - query by graph pattern matching SELECT ?person WHERE { Find all persons that ?person neo4j:KNOWS ?friend . KNOWS a friend that ?friend neo4j:KNOWS ?foe . KNOWS someone named “Larry Ellison”. ?foe neo4j:name "Larry Ellison" . } ๏ Gremlin - “perl for graphs” - query by traversal ./outE[@label='KNOWS']/inV[@age > 30]/@name Give me the names of all the people I know that are older than 30. 20
  • 30. Data manipulation API GraphDatabaseService graphDb = getGraphDbInstanceSomehow(); // Create Thomas 'Neo' Anderson Node mrAnderson = graphDb.createNode(); mrAnderson.setProperty( "name", "Thomas Anderson" ); mrAnderson.setProperty( "age", 29 ); // Create Morpheus Node morpheus = graphDb.createNode(); morpheus.setProperty( "name", "Morpheus" ); morpheus.setProperty( "rank", "Captain" ); morpheus.setProperty( "occupation", "Total bad ass" ); // Create relationship representing they know each other mrAnderson.createRelationshipTo( morpheus, RelTypes.KNOWS ); // ... similarly for Trinity, Cypher, Agent Smith, Architect 21
  • 31. Data manipulation API GraphDatabaseService graphDb = getGraphDbInstanceSomehow(); Transaction tx = graphDb.beginTx(); try { // Create Thomas 'Neo' Anderson Node mrAnderson = graphDb.createNode(); mrAnderson.setProperty( "name", "Thomas Anderson" ); mrAnderson.setProperty( "age", 29 ); // Create Morpheus Node morpheus = graphDb.createNode(); morpheus.setProperty( "name", "Morpheus" ); morpheus.setProperty( "rank", "Captain" ); morpheus.setProperty( "occupation", "Total bad ass" ); // Create relationship representing they know each other mrAnderson.createRelationshipTo( morpheus, RelTypes.KNOWS ); // ... similarly for Trinity, Cypher, Agent Smith, Architect tx.success(); } finally { tx.finish(); 21 }
  • 32. Graph traversals name: “The Architect” disclosure: “public” name: “Thomas Anderson” age: 29 name: “Cypher” last name: “Reagan” KNOWS name: “Morpheus” KNOWS KNOWS rank: “Captain” CODED BY LOVES occupation: “Total badass” KNOWS KNOWS name: “Trinity” disclosure: “secret” name: “Agent Smith” version: “1.0b” since: “meeting the oracle” since: “a year before the movie” language: “C++” cooperates on: “The Nebuchadnezzar” 22
  • 33. Graph traversals name: “The Architect” disclosure: “public” name: “Thomas Anderson” age: 29 name: “Cypher” last name: “Reagan” KNOWS name: “Morpheus” KNOWS KNOWS rank: “Captain” CODED BY LOVES occupation: “Total badass” KNOWS KNOWS name: “Trinity” disclosure: “secret” name: “Agent Smith” version: “1.0b” since: “meeting the oracle” since: “a year before the movie” language: “C++” cooperates on: “The Nebuchadnezzar” import neo4j class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j types = [ neo4j.Outgoing.KNOWS ] order = neo4j.BREADTH_FIRST stop = neo4j.STOP_AT_END_OF_GRAPH returnable = neo4j.RETURN_ALL_BUT_START_NODE for friend_node in Friends(mr_anderson): print "%s (@ depth=%s)" % ( friend_node["name"], friend_node.depth ) 23
  • 34. Graph traversals name: “The Architect” disclosure: “public” name: “Thomas Anderson” age: 29 name: “Cypher” last name: “Reagan” KNOWS name: “Morpheus” KNOWS KNOWS rank: “Captain” CODED BY LOVES occupation: “Total badass” KNOWS KNOWS name: “Trinity” disclosure: “secret” name: “Agent Smith” version: “1.0b” since: “meeting the oracle” since: “a year before the movie” language: “C++” cooperates on: “The Nebuchadnezzar” import neo4j class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j types = [ neo4j.Outgoing.KNOWS ] order = neo4j.BREADTH_FIRST stop = neo4j.STOP_AT_END_OF_GRAPH returnable = neo4j.RETURN_ALL_BUT_START_NODE for friend_node in Friends(mr_anderson): print "%s (@ depth=%s)" % ( friend_node["name"], friend_node.depth ) 23
  • 35. Graph traversals name: “The Architect” disclosure: “public” name: “Thomas Anderson” age: 29 name: “Cypher” last name: “Reagan” KNOWS name: “Morpheus” KNOWS KNOWS rank: “Captain” CODED BY LOVES occupation: “Total badass” KNOWS KNOWS name: “Trinity” disclosure: “secret” name: “Agent Smith” version: “1.0b” since: “meeting the oracle” since: “a year before the movie” language: “C++” cooperates on: “The Nebuchadnezzar” import neo4j class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j types = [ neo4j.Outgoing.KNOWS ] Morpheus (@ depth=1) order = neo4j.BREADTH_FIRST stop = neo4j.STOP_AT_END_OF_GRAPH returnable = neo4j.RETURN_ALL_BUT_START_NODE for friend_node in Friends(mr_anderson): print "%s (@ depth=%s)" % ( friend_node["name"], friend_node.depth ) 23
  • 36. Graph traversals name: “The Architect” disclosure: “public” name: “Thomas Anderson” age: 29 name: “Cypher” last name: “Reagan” KNOWS name: “Morpheus” KNOWS KNOWS rank: “Captain” CODED BY LOVES occupation: “Total badass” KNOWS KNOWS name: “Trinity” disclosure: “secret” name: “Agent Smith” version: “1.0b” since: “meeting the oracle” since: “a year before the movie” language: “C++” cooperates on: “The Nebuchadnezzar” import neo4j class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j types = [ neo4j.Outgoing.KNOWS ] Morpheus (@ depth=1) order = neo4j.BREADTH_FIRST Trinity (@ depth=1) stop = neo4j.STOP_AT_END_OF_GRAPH returnable = neo4j.RETURN_ALL_BUT_START_NODE for friend_node in Friends(mr_anderson): print "%s (@ depth=%s)" % ( friend_node["name"], friend_node.depth ) 23
  • 37. Graph traversals name: “The Architect” disclosure: “public” name: “Thomas Anderson” age: 29 name: “Cypher” last name: “Reagan” KNOWS name: “Morpheus” KNOWS KNOWS rank: “Captain” CODED BY LOVES occupation: “Total badass” KNOWS KNOWS name: “Trinity” disclosure: “secret” name: “Agent Smith” version: “1.0b” since: “meeting the oracle” since: “a year before the movie” language: “C++” cooperates on: “The Nebuchadnezzar” import neo4j class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j types = [ neo4j.Outgoing.KNOWS ] Morpheus (@ depth=1) order = neo4j.BREADTH_FIRST Trinity (@ depth=1) stop = neo4j.STOP_AT_END_OF_GRAPH Cypher (@ depth=2) returnable = neo4j.RETURN_ALL_BUT_START_NODE for friend_node in Friends(mr_anderson): print "%s (@ depth=%s)" % ( friend_node["name"], friend_node.depth ) 23
  • 38. Graph traversals name: “The Architect” disclosure: “public” name: “Thomas Anderson” age: 29 name: “Cypher” last name: “Reagan” KNOWS name: “Morpheus” KNOWS KNOWS rank: “Captain” CODED BY LOVES occupation: “Total badass” KNOWS KNOWS name: “Trinity” disclosure: “secret” name: “Agent Smith” version: “1.0b” since: “meeting the oracle” since: “a year before the movie” language: “C++” cooperates on: “The Nebuchadnezzar” import neo4j class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j types = [ neo4j.Outgoing.KNOWS ] Morpheus (@ depth=1) order = neo4j.BREADTH_FIRST Trinity (@ depth=1) stop = neo4j.STOP_AT_END_OF_GRAPH Cypher (@ depth=2) returnable = neo4j.RETURN_ALL_BUT_START_NODE Agent Smith (@ depth=3) for friend_node in Friends(mr_anderson): print "%s (@ depth=%s)" % ( friend_node["name"], friend_node.depth ) 23
  • 39. Graph traversals name: “The Architect” disclosure: “public” name: “Thomas Anderson” age: 29 name: “Cypher” last name: “Reagan” KNOWS name: “Morpheus” KNOWS KNOWS rank: “Captain” CODED BY LOVES occupation: “Total badass” KNOWS KNOWS name: “Trinity” disclosure: “secret” name: “Agent Smith” version: “1.0b” since: “meeting the oracle” since: “a year before the movie” language: “C++” cooperates on: “The Nebuchadnezzar” import neo4j class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j types = [ neo4j.Outgoing.KNOWS ] Morpheus (@ depth=1) order = neo4j.BREADTH_FIRST Trinity (@ depth=1) stop = neo4j.STOP_AT_END_OF_GRAPH Cypher (@ depth=2) returnable = neo4j.RETURN_ALL_BUT_START_NODE Agent Smith (@ depth=3) for friend_node in Friends(mr_anderson): print "%s (@ depth=%s)" % ( friend_node["name"], friend_node.depth ) 23
  • 40. Finding a place to start ๏ Traversals need a Node to start from • QUESTION: How do I find the start Node? • ANSWER:You use an Index ๏ Indexes in Neo4j are different from Indexes in Relational Databases • RDBMSes use them for Joining • Neo4j use them for simple lookup IndexService index = getGraphDbIndexServiceSomehow(); Node mrAnderson = index.getSingleNode( "name", "Thomas Anderson" ); performTraversalFrom( mrAnderson ); 24
  • 41. Indexes in Neo4j ๏ The Graph *is* the main index • Use relationship labels for navigation • Build index structures *in the graph* ‣Search trees, tag clouds, geospatial indexes, et.c. ‣Linked/skip lists or other data structures in the graph ‣We have utility libraries for this ๏ External indexes used *for lookup* • Finding a (number of) points to start traversals from • Major difference from RDBMS that use indexes for everything 25
  • 42. A domain object implemented in Neo4j public interface Person { String getName(); void setName( String firstName, String lastName ); } public final class PersonImpl implements Person { private final Node underlyingNode; public PersonImpl( Node underlyingNode ) { this.underlyingNode = underlyingNode; } public String getName() { return String.format("%s %s", underlyingNode.getProperty("first name"), underlyingNode.getProperty("last name") ); } public String setName(String firstName, String lastName) { underlyingNode.setProperty("first name", firstName); underlyingNode.setProperty("last name", lastName); } } 26
  • 43. Neo4j as Software Transactional Memory ๏ Implement objects as wrappers around Nodes and Relationships • Neo4j is fast enough to allow you to read all state from the Node/Relationship ๏ Mutating operations require transactions • The changes are isolated from all other threads until committed • Multiple mutations can be committed atomically ๏ Nested transactions are flattened • Makes it possible to have methods open their own transaction ๏ Fits nicely with the OO paradigm • More focus on data than on objects (comp. Object DBs) 27
  • 44. Why not use an O/R mapper? ๏ Model evolution in ORMs is a hard problem • virtually unsupported in most ORM systems ๏ SQL is “compatible” across many RDBMSs • data is still locked in ๏ Each ORM maps object models differently • Moving to another ORM == legacy schema support ‣except your legacy schema is a strange auto-generated one ๏ Object/Graph Mapping is always done the same way • allows you to keep your data through application changes • or share data between multiple implementations 28
  • 45. What an ORM doesn’t do ๏Deep traversals ๏Graph algorithms ๏Shortest path(s) ๏Routing ๏etc. 29
  • 46. Path exists in social network ๏ Each person has on average 50 friends The performance impact in Neo4j depends only on the degree of each node. in Tobias an RDBMS it depends on the number of entries in the tables involved in the join(s). Emil Johan Peter Database # persons query time Relational database 1 000 2 000 ms Neo4j Graph Database 1 000 2 ms Neo4j Graph Database 1 000 000 2 ms Relational database 1 000 000 way too long... 30
  • 47. Path exists in social network ๏ Each person has on average 50 friends The performance impact in Neo4j depends only on the degree of each node. in Tobias an RDBMS it depends on the number of entries in the tables involved in the join(s). Emil Johan Peter Database # persons query time Relational database 1 000 2 000 ms Neo4j Graph Database 1 000 2 ms Neo4j Graph Database 1 000 000 2 ms Relational database 1 000 000 way too long... 30
  • 48. Path exists in social network ๏ Each person has on average 50 friends The performance impact in Neo4j depends only on the degree of each node. in Tobias an RDBMS it depends on the number of entries in the tables involved in the join(s). Emil Johan Peter Database # persons query time Relational database 1 000 2 000 ms Neo4j Graph Database 1 000 2 ms Neo4j Graph Database 1 000 000 2 ms Relational database 1 000 000 way too long... 30
  • 49. Path exists in social network ๏ Each person has on average 50 friends The performance impact in Neo4j depends only on the degree of each node. in Tobias an RDBMS it depends on the number of entries in the tables involved in the join(s). Emil Johan Peter Database # persons query time Relational database 1 000 2 000 ms Neo4j Graph Database 1 000 2 ms Neo4j Graph Database 1 000 000 2 ms Relational database 1 000 000 way too long... 30
  • 50. Path exists in social network ๏ Each person has on average 50 friends The performance impact in Neo4j depends only on the degree of each node. in Tobias an RDBMS it depends on the number of entries in the tables involved in the join(s). Emil Johan Peter Database # persons query time Relational database 1 000 2 000 ms Neo4j Graph Database 1 000 2 ms Neo4j Graph Database 1 000 000 2 ms Relational database 1 000 000 way too long... 30
  • 51. On-line real time routing with Neo4j ๏ 20 million Nodes - represents places ๏ 62 million Edges - represents direct roads between places • These edges have a length property, for the length of the road ๏ Average optimal route, 100 separate roads, found in 100ms ๏ Worst case route we could find: • Optimal route is 5500 separate roads • Total length ~770km There’s a difference • Found in less than 3 seconds bet ween least number of hops and least cost. ๏ Uses A* “best first” search 31
  • 52. Routing with Neo4j - using Neo4j Graph-Algos # The cost evaluator - for choosing the best next node class GeoCostEvaluator include EstimateEvaluator def getCost(node, goal) straight_path_distance( node.getProperty("lat"), node.getProperty("lon"), goal.getProperty("lat"), goal.getProperty("lon") ) end end # Instantiate the A* search function path_finder = AStar.new( Neo4j::instance, RelationshipExpander.forTypes( DynamicRelationshipType.withName("road"), Direction::BOTH ), DoubleEvaluator.new("length"), GeoCostEvaluator.new ) # Find the best path between New York City and San Francisco best_path = path_finder.findSinglePath( NYC, SF ) 32
  • 53. Newest addition: Neo4j lets you REST ๏ Hello Neo4j REST server - Neo4j no longer needs to be embedded ๏ Opens up Neo4j to your favorite platform (even if that isn’t Java) • PHP, .NET, et.c. - libraries already exists! • http://wiki.neo4j.org/content/Getting_Started_REST ๏ Uses JSON for state transfer + browsable HTML for introspection ๏ Atomic modification operations ๏ Brand new declarative traversal framework • Extensible using your favorite scripting language ‣javascript is included. Jython, JRuby, et.c. supported 33
  • 54. Other cool Graph Databases ๏ Sones GraphDB • Graph Query Language - a SQL-like query language for graphs ๏ Franz Inc. AllegroGraph ๏ HypergraphDB ๏ InfoGrid ๏ Twitter’s FlockDB • Optimized for the Twitter use case - one level relationships ๏ Interestingly we all have different approaches 34
  • 55. Up until recently there was only one Database, the RDBMS. The days of a single database that rules all is over. One database to rule them all Image credits: The Lord of the Rings, New Line Cinema 35
  • 56. Use best suited storage for each kind of data The era of using RDBMSes for all problems is over. Instead we should use the database most suited for the problem at hand. Image credits: Unknown :’( 36
  • 57. Polyglot persistence ... we could even use multiple databases in conjunction, and let each database handle the things it does best. Document {...} {...} {...} 37
  • 58. Polyglot persistence SQL && NOSQL Document {...} {...} All databases are welcome! SQL and NOSQL - it is Not Only SQL! {...} 38
  • 59. Finding out more ๏ http://neo4j.org/ - project website ‣http://api.neo4j.org/ and http://components.neo4j.org/ ‣http://wiki.neo4j.org/ - HowTos, Tutorials, Examples, FAQ, et.c. ‣http://planet.neo4j.org/ - aggregation of blogs about Neo4j ๏ http://neotechnology.com/ - commercial licensing ๏ http://twitter.com/neo4j/team - follow the Neo4j team ๏ http://nosql.mypopescu.com/ - good source for news on NOSQL monitors Neo4j and other NOSQL solutions ๏ http://highscalability.com/ - has published a few articles about Neo4j 39
  • 60. Buzzword summary http://neo4j.org/ Semi structured SPARQL AGPLv3 ACID transactions Open Source Object mapping Gremlin Shortest path In-Graph indexes NOSQL A* routing whiteboard friendly RESTful Traversal Query language Embedded Beer Schema free Software Transactional Memory Right tool for the right job Scaling to complexity Free Software Polyglot persistence 40