SlideShare a Scribd company logo
Raimonds Simanovskis


Multidimensional
Data Analysis
with JRuby
Raimonds Simanovskis

       github.com/rsim




         @rsim

             .com
Relational
data model
SQL is good for detailed
       data queries
           Get all sales transactions in
           USA, California
SELECT customers.fullname, products.product_name,
  sales.sales_date, sales.unit_sales, sales.store_sales
FROM sales
  LEFT JOIN products ON sales.product_id = products.id
  LEFT JOIN customers ON sales.customer_id = customers.id
WHERE customers.country = 'USA' AND customers.state_province = 'CA'
SQL becomes complex
       for analytical queries
           Get total sales in USA, California
           in Q1, 2011 by main product groups

SELECT product_class.product_family,
       SUM(sales.unit_sales) unit_sales_sum,
       SUM(sales.store_sales) store_sales_sum
    FROM sales
      LEFT JOIN product ON sales.product_id = product.product_id
      LEFT JOIN product_class
           ON product.product_class_id = product_class.product_class_id
      LEFT JOIN time_by_day ON sales.time_id = time_by_day.time_id
      LEFT JOIN customer ON sales.customer_id = customer.customer_id
    WHERE time_by_day.the_year = 2011 AND time_by_day.quarter = 'Q1'
      AND customer.country = 'USA' AND customer.state_province = 'CA'
    GROUP BY product_class.product_family
If SQL is not good
   then we need
      NoSQL!
Maybe write distributed
map reduce function?




                http://browsertoolkit.com/fault-tolerance.png
Multidimensional
      Data Model
Multidimensional cubes

     Dimensions
Hierarchies and levels

      Measures
OLAP technologies
  On-Line Analytical Processing
Commercial Vendors

                 Oracle Essbase   SAP BUSINESSOBJECTS
Oracle OLAP




        Cognos
                                         Analysis Services
RailsWayCon: Multidimensional Data Analysis with JRuby
MDX query language
          Get total units sold and sales amount
          in USA, California in Q1, 2011
          by main product groups


SELECT {[Measures].[Unit Sales], [Measures].[Store Sales]} ON COLUMNS,
       [Product].children ON ROWS
FROM   [Sales]
WHERE ( [Time].[2011].[Q1], [Customers].[USA].[CA] )
http://github.com/rsim/mondrian-olap
(R)OLAP schema
Dimensional model:
 cubes
 dimensions (hierarchies & levels)
 measures, calculated measures


                   Mapping


Relational model:
 fact tables, dimension tables
 joined by foreign keys
OLAP schema
                       definition
schema = Mondrian::OLAP::Schema.define do
  cube 'Sales' do
    table 'sales'
    dimension 'Gender', :foreign_key => 'customer_id' do
      hierarchy :has_all => true, :primary_key => 'customer_id' do
        table 'customer'
        level 'Gender', :column => 'gender', :unique_members => true
      end
    end
    dimension 'Time', :foreign_key => 'time_id' do
      hierarchy :has_all => false, :primary_key => 'time_id' do
        table 'time_by_day'
        level 'Year', :column => 'the_year', :type => 'Numeric', :unique_members => true
        level 'Quarter', :column => 'quarter', :unique_members => false
        level 'Month',:column => 'month_of_year',:type => 'Numeric',:unique_members => false
      end
    end
    measure 'Unit Sales', :column => 'unit_sales', :aggregator => 'sum'
    measure 'Store Sales', :column => 'store_sales', :aggregator => 'sum'
  end
end
Query Builder in
              Ruby
       Get total units sold and sales amount
       in USA, California in Q1, 2011
       by main product groups

olap.from('Sales').
columns('[Measures].[Unit Sales]',
        '[Measures].[Store Sales]').
rows('[Product].children').
where('[Time].[2011].[Q1]', '[Customers].[USA].[CA]').
execute
Also more complex
                queries
           Get sales amount and profit %
           of top 50 products sold in USA and Canada
           during Q1, 2011

olap.from('Sales').
with_member('[Measures].[ProfitPct]').
  as('(Measures.[Store Sales] - Measures.[Store Cost]) / Measures.[Store Sales]',
  :format_string => 'Percent').
columns('[Measures].[Store Sales]', '[Measures].[ProfitPct]').
rows('[Product].children').crossjoin('[Customers].[Canada]', '[Customers].[USA]').
  top_count(50, '[Measures].[Store Sales]')
where('[Time].[2011].[Q1]').
execute
Demo
Used in eazybi.com

More Related Content

Similar to RailsWayCon: Multidimensional Data Analysis with JRuby (20)

PPT
Data ware housing- Introduction to olap .
Vibrant Technologies & Computers
 
PDF
Building a semantic/metrics layer using Calcite
Julian Hyde
 
PPT
Kishore jaladi-dw
sam2sung2
 
PPT
Datawarehosuing
NainaMalhotra6
 
PPT
Data Warehousing
Heena Madan
 
PDF
Informix physical database design for data warehousing
Keshav Murthy
 
PPT
Introduction to OLAP and OLTP Concepts - DBMS
Vasudha Rao
 
PDF
Business Intelligence: OLAP, Data Warehouse, and Column Store
Jason J Pulikkottil
 
PDF
Adding measures to Calcite SQL
Julian Hyde
 
KEY
PostgreSQL talk, Database 2011 conference
Reuven Lerner
 
PPT
ch19.ppt
Kalangivasavi
 
PPT
ch19.ppt
KARTHICKT41
 
PPT
OLAP Cubes in Datawarehousing
Prithwis Mukerjee
 
PDF
Data Warehousing and Data Mining
Hitesh Mohapatra
 
PPT
Drill / SQL / Optiq
Julian Hyde
 
PDF
mondrian-olap JRuby library
Raimonds Simanovskis
 
PDF
(Lecture 5)OLAP Querying.pdf
MobeenMasoudi
 
PPT
Data warehousing
Bhaskar Pathak
 
PPT
1242626441API2 upload
51 lecture
 
Data ware housing- Introduction to olap .
Vibrant Technologies & Computers
 
Building a semantic/metrics layer using Calcite
Julian Hyde
 
Kishore jaladi-dw
sam2sung2
 
Datawarehosuing
NainaMalhotra6
 
Data Warehousing
Heena Madan
 
Informix physical database design for data warehousing
Keshav Murthy
 
Introduction to OLAP and OLTP Concepts - DBMS
Vasudha Rao
 
Business Intelligence: OLAP, Data Warehouse, and Column Store
Jason J Pulikkottil
 
Adding measures to Calcite SQL
Julian Hyde
 
PostgreSQL talk, Database 2011 conference
Reuven Lerner
 
ch19.ppt
Kalangivasavi
 
ch19.ppt
KARTHICKT41
 
OLAP Cubes in Datawarehousing
Prithwis Mukerjee
 
Data Warehousing and Data Mining
Hitesh Mohapatra
 
Drill / SQL / Optiq
Julian Hyde
 
mondrian-olap JRuby library
Raimonds Simanovskis
 
(Lecture 5)OLAP Querying.pdf
MobeenMasoudi
 
Data warehousing
Bhaskar Pathak
 
1242626441API2 upload
51 lecture
 

More from Raimonds Simanovskis (20)

PDF
Profiling Mondrian MDX Requests in a Production Environment
Raimonds Simanovskis
 
PDF
Improve Mondrian MDX usability with user defined functions
Raimonds Simanovskis
 
PDF
Analyze and Visualize Git Log for Fun and Profit - DevTernity 2015
Raimonds Simanovskis
 
PDF
eazyBI Overview - Embedding Mondrian in other applications
Raimonds Simanovskis
 
PDF
Atvērto datu izmantošanas pieredze Latvijā
Raimonds Simanovskis
 
PDF
JavaScript Unit Testing with Jasmine
Raimonds Simanovskis
 
PDF
JRuby - Programmer's Best Friend on JVM
Raimonds Simanovskis
 
PDF
Agile Operations or How to sleep better at night
Raimonds Simanovskis
 
PDF
TDD - Why and How?
Raimonds Simanovskis
 
PDF
Analyze and Visualize Git Log for Fun and Profit
Raimonds Simanovskis
 
PDF
PL/SQL Unit Testing Can Be Fun
Raimonds Simanovskis
 
PDF
opendata.lv Case Study - Promote Open Data with Analytics and Visualizations
Raimonds Simanovskis
 
PDF
Extending Oracle E-Business Suite with Ruby on Rails
Raimonds Simanovskis
 
PDF
Rails-like JavaScript using CoffeeScript, Backbone.js and Jasmine
Raimonds Simanovskis
 
PDF
PL/SQL Unit Testing Can Be Fun!
Raimonds Simanovskis
 
PDF
Fast Web Applications Development with Ruby on Rails on Oracle
Raimonds Simanovskis
 
PDF
How I Learned To Stop Worrying And Love Test Driven Development
Raimonds Simanovskis
 
PDF
PL/SQL unit testing with Ruby
Raimonds Simanovskis
 
PDF
PL/SQL vienībtestēšana ar ruby
Raimonds Simanovskis
 
PDF
JSConf.eu Overview
Raimonds Simanovskis
 
Profiling Mondrian MDX Requests in a Production Environment
Raimonds Simanovskis
 
Improve Mondrian MDX usability with user defined functions
Raimonds Simanovskis
 
Analyze and Visualize Git Log for Fun and Profit - DevTernity 2015
Raimonds Simanovskis
 
eazyBI Overview - Embedding Mondrian in other applications
Raimonds Simanovskis
 
Atvērto datu izmantošanas pieredze Latvijā
Raimonds Simanovskis
 
JavaScript Unit Testing with Jasmine
Raimonds Simanovskis
 
JRuby - Programmer's Best Friend on JVM
Raimonds Simanovskis
 
Agile Operations or How to sleep better at night
Raimonds Simanovskis
 
TDD - Why and How?
Raimonds Simanovskis
 
Analyze and Visualize Git Log for Fun and Profit
Raimonds Simanovskis
 
PL/SQL Unit Testing Can Be Fun
Raimonds Simanovskis
 
opendata.lv Case Study - Promote Open Data with Analytics and Visualizations
Raimonds Simanovskis
 
Extending Oracle E-Business Suite with Ruby on Rails
Raimonds Simanovskis
 
Rails-like JavaScript using CoffeeScript, Backbone.js and Jasmine
Raimonds Simanovskis
 
PL/SQL Unit Testing Can Be Fun!
Raimonds Simanovskis
 
Fast Web Applications Development with Ruby on Rails on Oracle
Raimonds Simanovskis
 
How I Learned To Stop Worrying And Love Test Driven Development
Raimonds Simanovskis
 
PL/SQL unit testing with Ruby
Raimonds Simanovskis
 
PL/SQL vienībtestēšana ar ruby
Raimonds Simanovskis
 
JSConf.eu Overview
Raimonds Simanovskis
 
Ad

RailsWayCon: Multidimensional Data Analysis with JRuby

  • 2. Raimonds Simanovskis github.com/rsim @rsim .com
  • 4. SQL is good for detailed data queries Get all sales transactions in USA, California SELECT customers.fullname, products.product_name, sales.sales_date, sales.unit_sales, sales.store_sales FROM sales LEFT JOIN products ON sales.product_id = products.id LEFT JOIN customers ON sales.customer_id = customers.id WHERE customers.country = 'USA' AND customers.state_province = 'CA'
  • 5. SQL becomes complex for analytical queries Get total sales in USA, California in Q1, 2011 by main product groups SELECT product_class.product_family, SUM(sales.unit_sales) unit_sales_sum, SUM(sales.store_sales) store_sales_sum FROM sales LEFT JOIN product ON sales.product_id = product.product_id LEFT JOIN product_class ON product.product_class_id = product_class.product_class_id LEFT JOIN time_by_day ON sales.time_id = time_by_day.time_id LEFT JOIN customer ON sales.customer_id = customer.customer_id WHERE time_by_day.the_year = 2011 AND time_by_day.quarter = 'Q1' AND customer.country = 'USA' AND customer.state_province = 'CA' GROUP BY product_class.product_family
  • 6. If SQL is not good then we need NoSQL!
  • 7. Maybe write distributed map reduce function? http://browsertoolkit.com/fault-tolerance.png
  • 8. Multidimensional Data Model Multidimensional cubes Dimensions Hierarchies and levels Measures
  • 9. OLAP technologies On-Line Analytical Processing
  • 10. Commercial Vendors Oracle Essbase SAP BUSINESSOBJECTS Oracle OLAP Cognos Analysis Services
  • 12. MDX query language Get total units sold and sales amount in USA, California in Q1, 2011 by main product groups SELECT {[Measures].[Unit Sales], [Measures].[Store Sales]} ON COLUMNS, [Product].children ON ROWS FROM [Sales] WHERE ( [Time].[2011].[Q1], [Customers].[USA].[CA] )
  • 14. (R)OLAP schema Dimensional model: cubes dimensions (hierarchies & levels) measures, calculated measures Mapping Relational model: fact tables, dimension tables joined by foreign keys
  • 15. OLAP schema definition schema = Mondrian::OLAP::Schema.define do cube 'Sales' do table 'sales' dimension 'Gender', :foreign_key => 'customer_id' do hierarchy :has_all => true, :primary_key => 'customer_id' do table 'customer' level 'Gender', :column => 'gender', :unique_members => true end end dimension 'Time', :foreign_key => 'time_id' do hierarchy :has_all => false, :primary_key => 'time_id' do table 'time_by_day' level 'Year', :column => 'the_year', :type => 'Numeric', :unique_members => true level 'Quarter', :column => 'quarter', :unique_members => false level 'Month',:column => 'month_of_year',:type => 'Numeric',:unique_members => false end end measure 'Unit Sales', :column => 'unit_sales', :aggregator => 'sum' measure 'Store Sales', :column => 'store_sales', :aggregator => 'sum' end end
  • 16. Query Builder in Ruby Get total units sold and sales amount in USA, California in Q1, 2011 by main product groups olap.from('Sales'). columns('[Measures].[Unit Sales]', '[Measures].[Store Sales]'). rows('[Product].children'). where('[Time].[2011].[Q1]', '[Customers].[USA].[CA]'). execute
  • 17. Also more complex queries Get sales amount and profit % of top 50 products sold in USA and Canada during Q1, 2011 olap.from('Sales'). with_member('[Measures].[ProfitPct]'). as('(Measures.[Store Sales] - Measures.[Store Cost]) / Measures.[Store Sales]', :format_string => 'Percent'). columns('[Measures].[Store Sales]', '[Measures].[ProfitPct]'). rows('[Product].children').crossjoin('[Customers].[Canada]', '[Customers].[USA]'). top_count(50, '[Measures].[Store Sales]') where('[Time].[2011].[Q1]'). execute
  • 18. Demo