IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences
Recent Updates on IBM System G
— GraphBIG and Temporal Data
Yinglong Xia
IBM T.J. Watson Research Center
Yorktown Heights, NY 10598
© 2014 International Business Machines Corporation
IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences
2
IBM T.J. Watson Research Center
© 2014 International Business Machines Corporation
IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences
3
Using LDBC-SNB for GraphBIG
• GraphBIG = Graph Benchmark Suite from IBM System G and GaTech HPArch
• A wide selection of workloads from both CPU and GPU
• Workload ranging from graph traversal to Gibbs Sampling on Bayesian Network
• Illustrating processor architecture impact using h/w performance counter
• Fix input data and implementation
• Show performance profiling at processor architecture level
© 2014 International Business Machines Corporation
IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences
4
Beyond the Benchmarking for Graph DBs
• Graph computing was barely considered in architecture design
• Increasing motivation due to popularity of graph analytics
• Impact of architecture requires fixed input data and analytic implementation
© 2014 International Business Machines Corporation
IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences
5
© 2014 International Business Machines Corporation
IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences
6
Demanding Graph
• Interactions of entities in many big data applications are naturally modeled by property graphs
• Evolution of graph structure and properties over time usually provides useful information, which needs
to be maintained for query or analytics
• Graph analytics market grows increasingly fast as well as the graph data size and complexity, but
near real time response is typically required
Xiaoyan Fu, Seok-Hee Hong, Nikola S. Nikolov, Xiaobin Shen, Ying Xin Wu and Kai Xu,
Visualization and Analysis of Email Networks, Proceedings of APVIS 2007, IEEE, pp.1-8, 2007.
© 2014 International Business Machines Corporation
IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences
7
Use Case: Forensic Analysis on Individual Status
• Recover the dynamics of individual status
• Evaluate status measures, anomalies, etc.
• Propagate known status measures
• Estimate labels for each person at each time stamp
• Aggregate the received measures
Chain Graph: A collection of graphs on 

contiguous time steps
© 2014 International Business Machines Corporation
IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences
8
Use Case: Bitemporal Data Exploration
• Support the valid dimension and the transaction dimension
• Audit trail of what you know and when did you know
• History of how history from a business perspective was stored in the
database
http://bitemporalmodeling.com/temporal-data-blog/
© 2014 International Business Machines Corporation
IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences
9
Graph Data Management
SparkseeNeo4j
Titan
© 2014 International Business Machines Corporation
IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences
10
Organization of Graph Store
© 2014 International Business Machines Corporation
IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences
11
Organize Temporal Graph Data Name Default Value
vertex_history Disabled
num_vertex_property_bundles 0
edge_history Disabled
num_edge_property_bundles 0
… …
Flag

(uint8)
inEdge

(uint64)
inEdge Count
(uint16)
outEdge

(uint64)
outEdge Count

(uint16)
Property

(uint64)
Property Count

(unit64)
History

(unit64)
…
…
…
Vertex Record Table
inEdge List
Flag Property Property

Count
History …
…
…
Prev Edge_list_buffer<EID,VID,LID>
…
…
Edge Record Table
Accessed Vertex 

Record by VID * 

izeof (VtxRec)
inEdgeCount * sizeof (<EID,VID,LID>)

point to the buffer end
Accessed Edge 

Record by EID * 

izeof (EdgeRec)
Vertex Property Table
Prev property_buffer
…
…
PropertyCount

point to the buffer end
Edge Property Table
Prev property_buffer
…
…
Name Default
Value
min_VID 0
max_VID
min_EID 0
Max_EID
… …
Local Configuration
© 2014 International Business Machines Corporation
IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences
12
Pointer Jumping in Temporal Graph Inference
• Converting a temporal graph into tridiagonal system
• Forward Gaussian elimination by propagation
• Backward substitution to produce solutions
• A Parallel Solution to Thomas Algorithm
• Apply pointer jumping to Thomas algorithm
• Logarithmic speedup
parallel solution to solve a tridiagonal linear system
• Converting a chain graph into tridiagonal system
• Forward Gaussian elimination by propagation
• Backward substitution to produce solutions 

• A Parallel Solution to Thomas Algorithm
• Apply pointer jumping to Thomas algorithm
• Logarithmic speedup
• Propagate belief among vertices within and cross
time stamps
Speedup wrt Gaussian 

Elimination: T3 / logT
© 2014 International Business Machines Corporation
IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences
13
Comments and Questions?

Towards Temporal Graph Management and Analytics

  • 1.
    IBM Research –Industries & Solutions – Business Solutions & Mathematical Sciences Recent Updates on IBM System G — GraphBIG and Temporal Data Yinglong Xia IBM T.J. Watson Research Center Yorktown Heights, NY 10598
  • 2.
    © 2014 InternationalBusiness Machines Corporation IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences 2 IBM T.J. Watson Research Center
  • 3.
    © 2014 InternationalBusiness Machines Corporation IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences 3 Using LDBC-SNB for GraphBIG • GraphBIG = Graph Benchmark Suite from IBM System G and GaTech HPArch • A wide selection of workloads from both CPU and GPU • Workload ranging from graph traversal to Gibbs Sampling on Bayesian Network • Illustrating processor architecture impact using h/w performance counter • Fix input data and implementation • Show performance profiling at processor architecture level
  • 4.
    © 2014 InternationalBusiness Machines Corporation IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences 4 Beyond the Benchmarking for Graph DBs • Graph computing was barely considered in architecture design • Increasing motivation due to popularity of graph analytics • Impact of architecture requires fixed input data and analytic implementation
  • 5.
    © 2014 InternationalBusiness Machines Corporation IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences 5
  • 6.
    © 2014 InternationalBusiness Machines Corporation IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences 6 Demanding Graph • Interactions of entities in many big data applications are naturally modeled by property graphs • Evolution of graph structure and properties over time usually provides useful information, which needs to be maintained for query or analytics • Graph analytics market grows increasingly fast as well as the graph data size and complexity, but near real time response is typically required Xiaoyan Fu, Seok-Hee Hong, Nikola S. Nikolov, Xiaobin Shen, Ying Xin Wu and Kai Xu, Visualization and Analysis of Email Networks, Proceedings of APVIS 2007, IEEE, pp.1-8, 2007.
  • 7.
    © 2014 InternationalBusiness Machines Corporation IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences 7 Use Case: Forensic Analysis on Individual Status • Recover the dynamics of individual status • Evaluate status measures, anomalies, etc. • Propagate known status measures • Estimate labels for each person at each time stamp • Aggregate the received measures Chain Graph: A collection of graphs on 
 contiguous time steps
  • 8.
    © 2014 InternationalBusiness Machines Corporation IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences 8 Use Case: Bitemporal Data Exploration • Support the valid dimension and the transaction dimension • Audit trail of what you know and when did you know • History of how history from a business perspective was stored in the database http://bitemporalmodeling.com/temporal-data-blog/
  • 9.
    © 2014 InternationalBusiness Machines Corporation IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences 9 Graph Data Management SparkseeNeo4j Titan
  • 10.
    © 2014 InternationalBusiness Machines Corporation IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences 10 Organization of Graph Store
  • 11.
    © 2014 InternationalBusiness Machines Corporation IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences 11 Organize Temporal Graph Data Name Default Value vertex_history Disabled num_vertex_property_bundles 0 edge_history Disabled num_edge_property_bundles 0 … … Flag
 (uint8) inEdge
 (uint64) inEdge Count (uint16) outEdge
 (uint64) outEdge Count
 (uint16) Property
 (uint64) Property Count
 (unit64) History
 (unit64) … … … Vertex Record Table inEdge List Flag Property Property
 Count History … … … Prev Edge_list_buffer<EID,VID,LID> … … Edge Record Table Accessed Vertex 
 Record by VID * 
 izeof (VtxRec) inEdgeCount * sizeof (<EID,VID,LID>)
 point to the buffer end Accessed Edge 
 Record by EID * 
 izeof (EdgeRec) Vertex Property Table Prev property_buffer … … PropertyCount
 point to the buffer end Edge Property Table Prev property_buffer … … Name Default Value min_VID 0 max_VID min_EID 0 Max_EID … … Local Configuration
  • 12.
    © 2014 InternationalBusiness Machines Corporation IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences 12 Pointer Jumping in Temporal Graph Inference • Converting a temporal graph into tridiagonal system • Forward Gaussian elimination by propagation • Backward substitution to produce solutions • A Parallel Solution to Thomas Algorithm • Apply pointer jumping to Thomas algorithm • Logarithmic speedup parallel solution to solve a tridiagonal linear system • Converting a chain graph into tridiagonal system • Forward Gaussian elimination by propagation • Backward substitution to produce solutions 
 • A Parallel Solution to Thomas Algorithm • Apply pointer jumping to Thomas algorithm • Logarithmic speedup • Propagate belief among vertices within and cross time stamps Speedup wrt Gaussian 
 Elimination: T3 / logT
  • 13.
    © 2014 InternationalBusiness Machines Corporation IBM Research – Industries & Solutions – Business Solutions & Mathematical Sciences 13 Comments and Questions?