SlideShare a Scribd company logo
© 2013 IBM Corporation
IBM DB2 Analytics Accelerator (IDAA)
Near Real-Time Analytics with IDAA
March 2013
Daniel Martin (danmartin@de.ibm.com) – IBM Software Group, Information Management
© 2013 IBM Corporation
Disclaimer
© Copyright IBM Corporation 2012. All rights reserved.
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with
IBM Corp.
IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without
notice at IBM’s sole discretion. Information regarding potential future products is intended to outline
our general product direction and it should not be relied on in making a purchasing decision. The
information mentioned regarding potential future products is not a commitment, promise, or legal
obligation to deliver any material, code or functionality. Information about potential future products may
not be incorporated into any contract. The development, release, and timing of any future features or
functionality described for our products remains at our sole discretion.
IBM, the IBM logo, ibm.com, DB2, and DB2 for z/OS are trademarks or registered trademarks of International Business Machines
Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first
occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law
trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law
trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at
www.ibm.com/legal/copytrade.shtml
Other company, product, or service names may be trademarks or service marks of others.
© 2013 IBM Corporation3 03/20/13
Introduction & Overview
© 2013 IBM Corporation
Concept: Transparently accelerate analytical
queries by dynamically offloading (DB2 optimizer
decides) to a data warehouse appliance: no
application change!
• Transparency: applications connected to
DB2 are entirely unaware of the
Accelerator
• Integration: Deep integration with DB2
(security, monitoring, backup, ...)
• Self-managed workloads: queries are
executed in the most efficient location
• Simplified administration: appliance
hands-free operations, eliminating most
database tuning tasks
• Performance: Unprecedented response
times for both, OLTP and OLAP queries
IBM DB2 Analytics Accelerator (IDAA)
© 2013 IBM Corporation5
“Host” Computers
Snippet BladesTM
(S-Blades, SPUs)
Disk Enclosures
IDAA Server
SQL Compiler, Query Plan, Optimizer,
Administration
2 front/end hosts, IBM 3650M3 or 3850X5
clustered active-passive
2 Nehalem-EP Quad-core 2.4GHz per host
Processor &
streaming DB logic
High-performance database
engine streaming joins,
aggregations, sorts, etc.
e.g. TF12: 12 back/end SPUs
(more details on following charts)
Slice of User Data
Swap and Mirror partitions
High speed data streaming
High compression rate
EXP3000 JBOD Enclosures
12 x 3.5” 1TB, 7200RPM, SAS (3Gb/s)
max 116MB/s (200-500MB/s
compressed data)
e.g. TF12:
8 enclosures → 96 HDDs
32TB uncompressed user data (→ 128TB)
9 GB/s scan rate (~36GB/s w. compression)
Powered by IBM Netezza
© 2013 IBM Corporation6
DB2 for z/OS
Optimizer
ISAOptDRDARequestor
Smart Analytics Optimizer
Application
Application
Interface
Queries executed with Smart Analytics Optimizer
Queries executed without Smart Analytics Optimizer
Heartbeat (Smart Analytics Optimizer availability and performance indicators)
Query execution run-time for queries
that cannot be or should not be off-
loaded to ISAOpt
SPU
CPU FPGA
Memory
SPU
CPU FPGA
Memory
SPU
CPU FPGA
Memory
SPU
CPU FPGA
Memory
SMPHost
Heartbeat
IDAA Query Execution
© 2013 IBM Corporation7 03/20/13
Integrating Replication - Requirements
 The Incremental Update capability is part of the base offering for all customers, and not a separately
orderable feature
 Fully integrated into IDAA
– Managed via IDAA Studio
– Integrated into IDAA software update
– Integrated into IDAA HA concepts
– Automated scheduling of maintenance operations (RUNSTATS / REORG) on IDAA
– Automation possible via Stored Procedure
© 2013 IBM Corporation8 03/20/13
Complementing Existing Synchronization Options
 There are different options to synchronize tables between DB2 and IDAA
– Choice depends on IDAA usage scenarios, update frequency, affinity to partitions, etc.
Synchronization options Use cases, characteristics and requirements
Full table refresh
The entire content of a database table is refreshed for
accelerator processing
 Existing ETL process replaces entire table
 Multiple sources or complex transformations
 Smaller, un-partitioned tables
 Reporting based on consistent snapshot (“check point”)
Table partition refresh
For a partitioned database table, selected partitions can be
refreshed for accelerator processing
 Optimization for (time-) partitioned warehouse tables, appending changes “at the end”
 More efficient than full table refresh for larger tables
 Reporting based on consistent snapshot (“check point”)
Incremental update
Log-based capturing of changes and propagation to IDAA
with low latency (typically few minutes)
 Scattered updates after “bulk” load
 Reporting on continuously updated data (e.g., an ODS), considering most recent
changes
 More efficient for smaller updates than full table refresh
© 2013 IBM Corporation9 03/20/13
Reporting and Analytics on Continuously Changing Data
 With continuously changing data, users may experience different results for subsequent query
execution
– Users need to understand this behavior
 Can use “waitForReplication” Accelerator SP subcommand
– Wait until all committed data at the time of SP invocation has been applied to the target
Time
Users submitting queries
Updates to database
waitForReplication() waitForReplication()
© 2013 IBM Corporation10 03/20/13
Architecture
© 2013 IBM Corporation11 03/20/13
IBM Puredata System for AnalyticsIBM Puredata System for Analytics
Architecture
DB2 for z/OSDB2 for z/OS
insert
delete
update
Engine for
DB2 z/OS
(Log reading)
Engine for
DB2 z/OS
(Log reading)
IDAA
Database
IDAA
Database
Engine for IBM Netezza
(stage + apply changes)
Engine for IBM Netezza
(stage + apply changes)
APIAPI
IDAA ServerIDAA Server
Access Server
(manage engines and
subscriptions)
Access Server
(manage engines and
subscriptions)
(private network
10G fiber)
Catalog
information
Catalog
information
<xml>
IDAA Stored Procedures
ACCEL_CONTROL_ACCELERATOR
ACCEL_ENABLE_REPLICATION
...
IDAA Stored Procedures
ACCEL_CONTROL_ACCELERATOR
ACCEL_ENABLE_REPLICATION
...
JCLJCL
Automation Code
(creates data sources,
subscriptions)
Automation Code
(creates data sources,
subscriptions)
IDAA StudioIDAA Studio
© 2013 IBM Corporation12 03/20/13
Properties of this Architecture
 Optimized for throughput
– During normal operation, no disk I/O involved
• DB2 → log buffer → capture staging space → network → apply staging space → IDAA
– Changes within the apply staging space are consolidated on the target
• More than one change to the same row results in a single change
– Mini-batches to leverage Netezza bulk load interface
• The source sends a UR to the target once the commit log record was read
• The target applies all URs that arrived during a 60s window (or if size limit reached)
– UPDATEs are decomposed into <DELETE, INSERT> pairs (and merged with “regular”
DELETE and INSERT batches)
 Use of parallel UNLOAD with DB2 INTERNAL format to establish the initial snapshot of a table
– Replication continues from this snapshot (capture point automatically managed)
 IDAA schedules REORG automatically as a low prio task in the background as a threshold of
“disorganization” is reached on Netezza
 Simple identity mapping of tables
– No user-exits
– No transformations
 Based on “production” components
© 2013 IBM Corporation13 03/20/13
Incremental Update - Table Refresh Integration
Using IDAA table-refresh for taking the initial snapshot or re-syncing after bulk changes
Use case Details Operations
Enable incremental update on a
newly added table (state:
INITIAL_LOAD_PENDING)
Lock mode TABLE or TABLESET used
for the load to prevent in-flight changes
while the UNLOADs are running
● Enable replication for table
● Load table (sets capture point when
load completed)
● Start replication
Re-load a loaded, replicated
table, e.g. because of non-
logged operation on source
table
Assumption: table is synchronized after
re-load, replication will continue from
this new “snapshot”
● Full reload or partition-reload the table
(sets new capture point when the load
completed)
© 2013 IBM Corporation14 03/20/13
User Interface
Incremental update UI elements only visible if function was enabled on the DB2 subsystem
 Start / stop replication process (per subsystem-accelerator pair)
 Enable / disable replication (per table)
 Trace collection
 Information on replication latency and events
© 2013 IBM Corporation15 03/20/13
High-Availability Setup
 Capture side
– One active capture engine per DS-Group
• Multiple stand-by instances, coordinated via ENQ
• Shared metadata
– z/OS Communication Server migrates D-VIPA in case of fail-over
 Apply side (appliance internal)
– Integration into cluster management (active-standby)
– Mirrored disk between active and standby host (shared metadata)
– All components are migrated to the standby host and restarted
– replication will continue automatically where it left off
Member 1
Capture
(active)
Member 2
LPAR 2
LPAR 1
DS Group
Capture
(hot-standby)
catalog
D-VIPA
D-VIPA
© 2013 IBM Corporation16 03/20/13
Replication Tuning
 Replication on the target system produces DELETE statements with predicates on the unique columns
(index or constraint) of the source table
– Can use “clustered base tables” for more efficient location of rows to be deleted
– Caveat: may conflict with tuning objectives (e.g. table already clustered on time columns)
 If multiple unique constraints are available, we automatically select the “best” set of columns
– The set with the minimal number of columns (partially) matching existing clustering columns
 If tables are not clustered yet, the system suggests to cluster on source table columns with unique
index or unique constraint
© 2013 IBM Corporation17 03/20/13
Evaluation
© 2013 IBM Corporation18 03/20/13
Impact on Concurrently Running Queries
 Validated that incremental update has only minor impact on query response time
– “No” workload:
• 10x parallel queries: 5 streaming, 5 aggregation / group by
– “Medium” workload:
• 10x parallel queries: 5 streaming, 5 aggregation / group by
• Replication from 1 subsystem: 300.000 rows/minute / 5.000 rows/s
– “Full” workload
• 10x parallel queries: 5 streaming, 5 aggregation / group by
• Replication from 2 subsystems: 2.0 mio rows/minute, 33.333 rows/s
© 2013 IBM Corporation19 03/20/13
Table Refresh “Best Practices”
© 2013 IBM Corporation20 03/20/13

More Related Content

What's hot (20)

PDF
Nové vlastnosti Oracle Database Appliance
MarketingArrowECS_CZ
 
PDF
Open Innovation with Power Systems
IBM Power Systems
 
PPTX
Understanding the IBM Power Systems Advantage
IBM Power Systems
 
PDF
IBM POWER8 as an HPC platform
Alexander Pozdneev
 
PDF
IBM Power9 Features and Specifications
inside-BigData.com
 
PDF
2017-02-21 AFCEA West Building Continuous Integration & Deployment (CI/CD) Pi...
Shawn Wells
 
PPTX
Overcoming write availability challenges of PostgreSQL
EDB
 
PPTX
Expert Guide to Migrating Legacy Databases to Postgres
EDB
 
PPTX
Public Sector Virtual Town Hall: High Availability for PostgreSQL
EDB
 
PDF
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
Cuneyt Goksu
 
PPTX
An Expert Guide to Migrating Legacy Databases to PostgreSQL
EDB
 
PPTX
How to Design for Database High Availability
EDB
 
PDF
MOUG17 Keynote: Oracle OpenWorld Major Announcements
Monica Li
 
PDF
Db2 family and v11.1.4.4
ModusOptimum
 
PDF
HDT for Mainframe Considerations: Simplified Tiered Storage
Hitachi Vantara
 
PDF
Superior Cloud Economics with Power Systems
IBM Power Systems
 
PPTX
Expert summit SQL Server 2016
Łukasz Grala
 
PDF
Co-Design Architecture for Exascale
inside-BigData.com
 
PDF
Migrating from Oracle to Postgres
EDB
 
PPTX
Beginner's Guide to High Availability for Postgres
EDB
 
Nové vlastnosti Oracle Database Appliance
MarketingArrowECS_CZ
 
Open Innovation with Power Systems
IBM Power Systems
 
Understanding the IBM Power Systems Advantage
IBM Power Systems
 
IBM POWER8 as an HPC platform
Alexander Pozdneev
 
IBM Power9 Features and Specifications
inside-BigData.com
 
2017-02-21 AFCEA West Building Continuous Integration & Deployment (CI/CD) Pi...
Shawn Wells
 
Overcoming write availability challenges of PostgreSQL
EDB
 
Expert Guide to Migrating Legacy Databases to Postgres
EDB
 
Public Sector Virtual Town Hall: High Availability for PostgreSQL
EDB
 
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
Cuneyt Goksu
 
An Expert Guide to Migrating Legacy Databases to PostgreSQL
EDB
 
How to Design for Database High Availability
EDB
 
MOUG17 Keynote: Oracle OpenWorld Major Announcements
Monica Li
 
Db2 family and v11.1.4.4
ModusOptimum
 
HDT for Mainframe Considerations: Simplified Tiered Storage
Hitachi Vantara
 
Superior Cloud Economics with Power Systems
IBM Power Systems
 
Expert summit SQL Server 2016
Łukasz Grala
 
Co-Design Architecture for Exascale
inside-BigData.com
 
Migrating from Oracle to Postgres
EDB
 
Beginner's Guide to High Availability for Postgres
EDB
 

Viewers also liked (9)

DOCX
Job center
Munavvar Patel
 
PPT
Presentation of nouns
Juan Manuel Londoño
 
PPT
Poetic devices
adriannlewis
 
PDF
Remembrance Day
Nicola Carr-White
 
PPT
Persuasive writing g7
Siorella Gonzales Sánchez
 
PPT
Job centre presentation
Munavvar Patel
 
PPTX
Nouns (1)
AtomanZe Kmutt
 
PPT
Singular and plural nouns ppt
Learning Tree
 
PPTX
10 facts about jobs in the future
Pew Research Center's Internet & American Life Project
 
Job center
Munavvar Patel
 
Presentation of nouns
Juan Manuel Londoño
 
Poetic devices
adriannlewis
 
Remembrance Day
Nicola Carr-White
 
Persuasive writing g7
Siorella Gonzales Sánchez
 
Job centre presentation
Munavvar Patel
 
Nouns (1)
AtomanZe Kmutt
 
Singular and plural nouns ppt
Learning Tree
 
Ad

Similar to EDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics Accelerator (20)

PDF
IBM DB2 Analytics Accelerator Trends & Directions by Namik Hrle
Surekha Parekh
 
PDF
IBM Analytics Accelerator Trends & Directions Namk Hrle
Surekha Parekh
 
PDF
Greenplum Architecture
Alexey Grishchenko
 
PDF
Ibm db2 analytics accelerator high availability and disaster recovery
bupbechanhgmail
 
PPT
13721876
Mehrdad Rastegar
 
PPTX
8392-exadatamaa-1887964.pptx
RaniVuppal
 
PDF
Oracle Database 12c Multitenant for Consolidation
Yudi Herdiana
 
PPT
Informix warehouse and accelerator overview
Keshav Murthy
 
PDF
Oracle MAA Best Practices - Applications Considerations
Markus Michalewicz
 
PDF
Consolidate your SAP System landscape Teched && d-code 2014
Goetz Lessmann
 
PDF
Stephan Hummel – IT-Tage 2015 – DB2 In-Memory - Eine Technologie nicht nur fü...
Informatik Aktuell
 
PDF
System z Technology Summit Streamlining Utilities
Surekha Parekh
 
PDF
DB2 pureScale Overview Sept 2010
Laura Hood
 
PPTX
Intro to goldilocks inmemory db - low latency
Dongpyo Lee
 
PDF
Greenplum feature
Ahmad Yani Emrizal
 
PDF
오라클 DR 및 복제 솔루션(Dbvisit 소개)
Linux Foundation Korea
 
PDF
The Central View of your Data with Postgres
EDB
 
PDF
Présentation IBM DB2 Blu - Fabrizio DANUSSO
IBMInfoSphereUGFR
 
PDF
Db2 analytics accelerator technical update
Cuneyt Goksu
 
PDF
MAA for Oracle Database, Exadata and the Cloud
Markus Michalewicz
 
IBM DB2 Analytics Accelerator Trends & Directions by Namik Hrle
Surekha Parekh
 
IBM Analytics Accelerator Trends & Directions Namk Hrle
Surekha Parekh
 
Greenplum Architecture
Alexey Grishchenko
 
Ibm db2 analytics accelerator high availability and disaster recovery
bupbechanhgmail
 
8392-exadatamaa-1887964.pptx
RaniVuppal
 
Oracle Database 12c Multitenant for Consolidation
Yudi Herdiana
 
Informix warehouse and accelerator overview
Keshav Murthy
 
Oracle MAA Best Practices - Applications Considerations
Markus Michalewicz
 
Consolidate your SAP System landscape Teched && d-code 2014
Goetz Lessmann
 
Stephan Hummel – IT-Tage 2015 – DB2 In-Memory - Eine Technologie nicht nur fü...
Informatik Aktuell
 
System z Technology Summit Streamlining Utilities
Surekha Parekh
 
DB2 pureScale Overview Sept 2010
Laura Hood
 
Intro to goldilocks inmemory db - low latency
Dongpyo Lee
 
Greenplum feature
Ahmad Yani Emrizal
 
오라클 DR 및 복제 솔루션(Dbvisit 소개)
Linux Foundation Korea
 
The Central View of your Data with Postgres
EDB
 
Présentation IBM DB2 Blu - Fabrizio DANUSSO
IBMInfoSphereUGFR
 
Db2 analytics accelerator technical update
Cuneyt Goksu
 
MAA for Oracle Database, Exadata and the Cloud
Markus Michalewicz
 
Ad

Recently uploaded (20)

PDF
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
PDF
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
PDF
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PPTX
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
PDF
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
PDF
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
PDF
How to Hire AI Developers_ Step-by-Step Guide in 2025.pdf
DianApps Technologies
 
PDF
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
PDF
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
PDF
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
PDF
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
PPTX
Foundations of Marketo Engage - Powering Campaigns with Marketo Personalization
bbedford2
 
PPTX
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PPTX
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
PDF
Driver Easy Pro 6.1.1 Crack Licensce key 2025 FREE
utfefguu
 
PPTX
Finding Your License Details in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
PPTX
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
How to Hire AI Developers_ Step-by-Step Guide in 2025.pdf
DianApps Technologies
 
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
Foundations of Marketo Engage - Powering Campaigns with Marketo Personalization
bbedford2
 
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
Driver Easy Pro 6.1.1 Crack Licensce key 2025 FREE
utfefguu
 
Finding Your License Details in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 

EDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics Accelerator

  • 1. © 2013 IBM Corporation IBM DB2 Analytics Accelerator (IDAA) Near Real-Time Analytics with IDAA March 2013 Daniel Martin ([email protected]) – IBM Software Group, Information Management
  • 2. © 2013 IBM Corporation Disclaimer © Copyright IBM Corporation 2012. All rights reserved. U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion. IBM, the IBM logo, ibm.com, DB2, and DB2 for z/OS are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml Other company, product, or service names may be trademarks or service marks of others.
  • 3. © 2013 IBM Corporation3 03/20/13 Introduction & Overview
  • 4. © 2013 IBM Corporation Concept: Transparently accelerate analytical queries by dynamically offloading (DB2 optimizer decides) to a data warehouse appliance: no application change! • Transparency: applications connected to DB2 are entirely unaware of the Accelerator • Integration: Deep integration with DB2 (security, monitoring, backup, ...) • Self-managed workloads: queries are executed in the most efficient location • Simplified administration: appliance hands-free operations, eliminating most database tuning tasks • Performance: Unprecedented response times for both, OLTP and OLAP queries IBM DB2 Analytics Accelerator (IDAA)
  • 5. © 2013 IBM Corporation5 “Host” Computers Snippet BladesTM (S-Blades, SPUs) Disk Enclosures IDAA Server SQL Compiler, Query Plan, Optimizer, Administration 2 front/end hosts, IBM 3650M3 or 3850X5 clustered active-passive 2 Nehalem-EP Quad-core 2.4GHz per host Processor & streaming DB logic High-performance database engine streaming joins, aggregations, sorts, etc. e.g. TF12: 12 back/end SPUs (more details on following charts) Slice of User Data Swap and Mirror partitions High speed data streaming High compression rate EXP3000 JBOD Enclosures 12 x 3.5” 1TB, 7200RPM, SAS (3Gb/s) max 116MB/s (200-500MB/s compressed data) e.g. TF12: 8 enclosures → 96 HDDs 32TB uncompressed user data (→ 128TB) 9 GB/s scan rate (~36GB/s w. compression) Powered by IBM Netezza
  • 6. © 2013 IBM Corporation6 DB2 for z/OS Optimizer ISAOptDRDARequestor Smart Analytics Optimizer Application Application Interface Queries executed with Smart Analytics Optimizer Queries executed without Smart Analytics Optimizer Heartbeat (Smart Analytics Optimizer availability and performance indicators) Query execution run-time for queries that cannot be or should not be off- loaded to ISAOpt SPU CPU FPGA Memory SPU CPU FPGA Memory SPU CPU FPGA Memory SPU CPU FPGA Memory SMPHost Heartbeat IDAA Query Execution
  • 7. © 2013 IBM Corporation7 03/20/13 Integrating Replication - Requirements  The Incremental Update capability is part of the base offering for all customers, and not a separately orderable feature  Fully integrated into IDAA – Managed via IDAA Studio – Integrated into IDAA software update – Integrated into IDAA HA concepts – Automated scheduling of maintenance operations (RUNSTATS / REORG) on IDAA – Automation possible via Stored Procedure
  • 8. © 2013 IBM Corporation8 03/20/13 Complementing Existing Synchronization Options  There are different options to synchronize tables between DB2 and IDAA – Choice depends on IDAA usage scenarios, update frequency, affinity to partitions, etc. Synchronization options Use cases, characteristics and requirements Full table refresh The entire content of a database table is refreshed for accelerator processing  Existing ETL process replaces entire table  Multiple sources or complex transformations  Smaller, un-partitioned tables  Reporting based on consistent snapshot (“check point”) Table partition refresh For a partitioned database table, selected partitions can be refreshed for accelerator processing  Optimization for (time-) partitioned warehouse tables, appending changes “at the end”  More efficient than full table refresh for larger tables  Reporting based on consistent snapshot (“check point”) Incremental update Log-based capturing of changes and propagation to IDAA with low latency (typically few minutes)  Scattered updates after “bulk” load  Reporting on continuously updated data (e.g., an ODS), considering most recent changes  More efficient for smaller updates than full table refresh
  • 9. © 2013 IBM Corporation9 03/20/13 Reporting and Analytics on Continuously Changing Data  With continuously changing data, users may experience different results for subsequent query execution – Users need to understand this behavior  Can use “waitForReplication” Accelerator SP subcommand – Wait until all committed data at the time of SP invocation has been applied to the target Time Users submitting queries Updates to database waitForReplication() waitForReplication()
  • 10. © 2013 IBM Corporation10 03/20/13 Architecture
  • 11. © 2013 IBM Corporation11 03/20/13 IBM Puredata System for AnalyticsIBM Puredata System for Analytics Architecture DB2 for z/OSDB2 for z/OS insert delete update Engine for DB2 z/OS (Log reading) Engine for DB2 z/OS (Log reading) IDAA Database IDAA Database Engine for IBM Netezza (stage + apply changes) Engine for IBM Netezza (stage + apply changes) APIAPI IDAA ServerIDAA Server Access Server (manage engines and subscriptions) Access Server (manage engines and subscriptions) (private network 10G fiber) Catalog information Catalog information <xml> IDAA Stored Procedures ACCEL_CONTROL_ACCELERATOR ACCEL_ENABLE_REPLICATION ... IDAA Stored Procedures ACCEL_CONTROL_ACCELERATOR ACCEL_ENABLE_REPLICATION ... JCLJCL Automation Code (creates data sources, subscriptions) Automation Code (creates data sources, subscriptions) IDAA StudioIDAA Studio
  • 12. © 2013 IBM Corporation12 03/20/13 Properties of this Architecture  Optimized for throughput – During normal operation, no disk I/O involved • DB2 → log buffer → capture staging space → network → apply staging space → IDAA – Changes within the apply staging space are consolidated on the target • More than one change to the same row results in a single change – Mini-batches to leverage Netezza bulk load interface • The source sends a UR to the target once the commit log record was read • The target applies all URs that arrived during a 60s window (or if size limit reached) – UPDATEs are decomposed into <DELETE, INSERT> pairs (and merged with “regular” DELETE and INSERT batches)  Use of parallel UNLOAD with DB2 INTERNAL format to establish the initial snapshot of a table – Replication continues from this snapshot (capture point automatically managed)  IDAA schedules REORG automatically as a low prio task in the background as a threshold of “disorganization” is reached on Netezza  Simple identity mapping of tables – No user-exits – No transformations  Based on “production” components
  • 13. © 2013 IBM Corporation13 03/20/13 Incremental Update - Table Refresh Integration Using IDAA table-refresh for taking the initial snapshot or re-syncing after bulk changes Use case Details Operations Enable incremental update on a newly added table (state: INITIAL_LOAD_PENDING) Lock mode TABLE or TABLESET used for the load to prevent in-flight changes while the UNLOADs are running ● Enable replication for table ● Load table (sets capture point when load completed) ● Start replication Re-load a loaded, replicated table, e.g. because of non- logged operation on source table Assumption: table is synchronized after re-load, replication will continue from this new “snapshot” ● Full reload or partition-reload the table (sets new capture point when the load completed)
  • 14. © 2013 IBM Corporation14 03/20/13 User Interface Incremental update UI elements only visible if function was enabled on the DB2 subsystem  Start / stop replication process (per subsystem-accelerator pair)  Enable / disable replication (per table)  Trace collection  Information on replication latency and events
  • 15. © 2013 IBM Corporation15 03/20/13 High-Availability Setup  Capture side – One active capture engine per DS-Group • Multiple stand-by instances, coordinated via ENQ • Shared metadata – z/OS Communication Server migrates D-VIPA in case of fail-over  Apply side (appliance internal) – Integration into cluster management (active-standby) – Mirrored disk between active and standby host (shared metadata) – All components are migrated to the standby host and restarted – replication will continue automatically where it left off Member 1 Capture (active) Member 2 LPAR 2 LPAR 1 DS Group Capture (hot-standby) catalog D-VIPA D-VIPA
  • 16. © 2013 IBM Corporation16 03/20/13 Replication Tuning  Replication on the target system produces DELETE statements with predicates on the unique columns (index or constraint) of the source table – Can use “clustered base tables” for more efficient location of rows to be deleted – Caveat: may conflict with tuning objectives (e.g. table already clustered on time columns)  If multiple unique constraints are available, we automatically select the “best” set of columns – The set with the minimal number of columns (partially) matching existing clustering columns  If tables are not clustered yet, the system suggests to cluster on source table columns with unique index or unique constraint
  • 17. © 2013 IBM Corporation17 03/20/13 Evaluation
  • 18. © 2013 IBM Corporation18 03/20/13 Impact on Concurrently Running Queries  Validated that incremental update has only minor impact on query response time – “No” workload: • 10x parallel queries: 5 streaming, 5 aggregation / group by – “Medium” workload: • 10x parallel queries: 5 streaming, 5 aggregation / group by • Replication from 1 subsystem: 300.000 rows/minute / 5.000 rows/s – “Full” workload • 10x parallel queries: 5 streaming, 5 aggregation / group by • Replication from 2 subsystems: 2.0 mio rows/minute, 33.333 rows/s
  • 19. © 2013 IBM Corporation19 03/20/13 Table Refresh “Best Practices”
  • 20. © 2013 IBM Corporation20 03/20/13