Arrays in database systems,
     the next frontier ?
            Martin Kersten
                  CWI




      NLeSC 9 Nov 2011
“We can't solve
problems by using
the same kind of
thinking we used
when we created
them.”




         NLeSC 9 Nov 2011
Agenda

A crash course on column-stores


Column stores for science applications


The SciQL array query language




              NLeSC 9 Nov 2011
The world of column stores
                             Motivation

Relational DBMSs dominate since the late 1970's / 1980's
   l     Transactional workloads (OLTP, row-wise access)
   l     I/O based processing
   l     Ingres, Postgresql, MySQL, Oracle, SQLserver, DB2, …

Column stores dominate product development since 2005
    l    Datawarehouses and business intelligence applications
    l    Startups: Infobright, Aster Data, Greenplum, LucidDB,..
    l    Commercial: Microsoft, IBM, SAP,…
                         MonetDB, the pioneer

                      NLeSC 9 Nov 2011
The world of column stores
Workload changes: Transactions (OLTP) vs ...‫‏‬




            NLeSC 9 Nov 2011
The world of column stores
Workload changes: ... vs OLAP, BI, Data Mining, ...




              NLeSC 9 Nov 2011
The world of column stores

             Databases hit The Memory Wall

§  Detailed and exhaustive analysis for different workloads using
    4 RDBMSs by Ailamaki, DeWitt, Hill,, Wood in VLDB 1999:
    “DBMSs On A Modern Processor: Where Does Time Go?”‫‏‬
§  CPU is 60%-90% idle,
    waiting for memory:
     §  L1 data stalls
     §  L1 instruction stalls
     §  L2 data stalls
     §  TLB stalls
     §  Branch mispredictions
     §  Resource stalls




                        NLeSC 9 Nov 2011
The world of column stores
Hardware Changes: The Memory Wall




Trip to memory = 1000s of instructions!




        NLeSC 9 Nov 2011
Storing Relations in MonetDB




Void         Void           Void        Void    Void
1000          1000          1000        1000    1000
  .            .              .             .     .

  .            .              .             .     .

  .            .              .             .     .

  .            .              .             .     .

  .            .              .             .     .




Virtual OID: seqbase=1000 (increment=1)
                         NLeSC 9 Nov 2011
BAT Data Structure




                                      BAT:
                                      binary association table
               Head     Tail
                                      BUN:
                                      binary unit

Hash tables,                          Head & Tail:
                                      BUN heap:
T-trees,                              - consecutive memory
R-trees,                                blocks (arrays)‫‏‬
                                        block (array)‫‏‬
...                                   - memory-mapped file
                                                         files

                                      Tail Heap:
                                       - best-effort duplicate
                                         elimination for strings
                                        (~ dictionary encoding)
               NLeSC 9 Nov 2011
MonetDB Front-end: SQL

l    SQL 2003
            l      Parse SQL into logical n-ary relational algebra tree
            l      Translate n-ary relational algebra into logical 2-ary relational algebra
            l      Turn logical 2-ary plan into physical 2-ary plan (MAL program)


l    Front-end specific strategic optimization:
      l    Heuristic optimization during all three previous steps
            l  Primary key and distinct constraints:
              l    Create and maintain hash indices
            l      Foreign key constraints
              l    Create and maintain foreign key join indices

                                 NLeSC 9 Nov 2011
MonetDB Front-end: SQL
EXPLAIN SELECT a, z FROM t, s WHERE t.c = s.x;

function user.s2_1():void;
barrier _73 := language.dataflow();
  _2:bat[:oid,:int] := sql.bind("sys","t","c",0);
  _7:bat[:oid,:int] := sql.bind("sys","s","x",0);
  _10 := bat.reverse(_7);
  _11 := algebra.join(_2,_10);
  _13 := algebra.markT(_11,0@0);
  _14 := bat.reverse(_13);
  _15:bat[:oid,:int] := sql.bind("sys","t","a",0);
  _17 := algebra.leftjoin(_14,_15);
  _18 := bat.reverse(_11);
  _19 := algebra.markT(_18,0@0);
  _20 := bat.reverse(_19);
  _21:bat[:oid,:int] := sql.bind("sys","s","z",0);
  _23 := algebra.leftjoin(_20,_21);
exit _73;
  _24 := sql.resultSet(2,1,_17);
  sql.rsColumn(_24,"sys.t","a","int",32,0,_17);
  sql.rsColumn(_24,"sys.s","z","int",32,0,_23);
  _33 := io.stdout();
  sql.exportResult(_33,_24);
end s2_1;

        NLeSC 9 Nov 2011
MonetDB/5 Back-end: MAL
l    MAL: Monet Assembly Language
      l  textual interface
      l    Interpreted language
l    Designed as system interface language
       l  Reduced, concise syntax
       l  Strict typing
       l  Meant for automatic generation and parsing/rewriting/processing
       l  Not meant to be typed by humans
l    Efficient parser
       l  Low overhead
       l  Inherent support for tactical optimization: MAL -> MAL
       l  Support for optimizer plug-ins
       l  Support for runtime schedulers
l    Binary-algebra core
l    Flow control (MAL is computational complete)‫‏‬
                         NLeSC 9 Nov 2011
Processing Model (MonetDB Kernel)‫‏‬

l    Bulk processing:
       l  full materialization of all intermediate results

l    Binary (i.e., 2-column) algebra core:
       l  select, join, semijoin, outerjoin
       l  union, intersection, diff (BAT-wise & column-wise)‫‏‬
       l  group, count, max, min, sum, avg
       l  reverse, mirror, mark

l    Runtime operational optimization:
       l  Choosing optimal algorithm & implementation according to
           input properties and system status



                       NLeSC 9 Nov 2011
Processing Model (MonetDB Kernel)‫‏‬
       l     Heavy use of code expansion to reduce cost
1 algebra operator
                                                        select()‫‏‬

3 overloaded operators               select(“=“,value)     select(“between”,L,H)
                                                    select(“fcn”,parm)‫‏‬

10 operator algorithms        scan     hash-lookup bin-search       bin-tree   pos-lookup

                                            scan_range_select_oid_int(),
~1500(!) routines                          hash_equi_select_void_str(), …
(macro expansion)‫‏‬
  •          ~1500 selection routines
  •          149 unary operations
  •          335 join/group operations
  •          ...



                               NLeSC 9 Nov 2011
The Software Stack


Front-ends     XQuery                SQL 03      SciQL   RDF

                                    Optimizers


Back-end(s)   MonetDB 4             MonetDB 5


  Kernel                  MonetDB kernel




                   NLeSC 9 Nov 2011
The Software Stack

                                                 Strategic optimization

Front-ends     XQuery                SQL 03               MAL

                                    Optimizers   Tactical optimization:
                                                 MAL -> MAL rewrites

Back-end(s)   MonetDB 4             MonetDB 5             MAL

                                                      Runtime
  Kernel                  MonetDB kernel             operational
                                                     optimization




                   NLeSC 9 Nov 2011
MonetDB vs Traditional DBMS Architecture

l    Architecture-Conscious Query Processing
                      vs Magnetic disk I/O conscious processing
                     - 
      l    Data layout, algorithms, cost models
l    RISC Relational Algebra (operator-at-a-time)
                     - vs Tuple-at-a-time Iterator Model
      l    Faster through simplicity: no tuple expression interpreter
l    Multi-Model: ODMG, SQL, XML/XQuery, ..., RDF/SPARQL
                     vs Relational with Bolt-on Subsystems
                     - 
      l    Columns as the building block for complex data structures

l    Decoupling of Transactions from Execution/Buffering
                      vs ARIES integrated into Execution/Buffering/Indexing
                     - 
      l    ACID, but not ARIES.. Pay as you need transaction overhead.

l    Run-Time Indexing and Query Optimization
                     - vs Static DBA/Workload-driven Optimization & Indexing
      l    Extensible Optimizer Framework;
      l    cracking, recycling, sampling-based runtime optimization
                                NLeSC 9 Nov 2011
Evolution

It is not the strongest of the
species that survives, nor the
most intelligent, but the one
most responsive to change. 

Charles Darwin (1809 - 1882)



        NLeSC 9 Nov 2011
Agenda

A crash course on column-stores


Column stores for science applications


The SciQL array query language




              NLeSC 9 Nov 2011
NLeSC 9 Nov 2011
SkyServer Schema




     446	
  columns	
  
   >585	
  million	
  rows	
  




      6	
  columns	
  
>	
  20	
  Billion	
  rows	
  




                                 NLeSC 9 Nov 2011
“An architecture for recycling
             Recycler                     intermediates in a column-store”.
                                          Ivanova, Kersten, Nes, Goncalves.
             motivation & idea            ACM TODS 35(4), Dec. 2010


Motivation:
     l  scientific databases, data analytics

     l  Terabytes of data (observational , transactional)

     l  Prevailing read-only workload

     l  Ad-hoc queries with commonalities

Background:
l  Operator-at-a-time execution paradigm

     Ø  Automatic materialization of intermediates

l  Canonical column-store organization

     Ø  Intermediates have reduced dimensionality and finer granularity

     Ø  Simplified overlap analysis

Recycling idea:
l  instead of garbage collecting,

l  keep the intermediates and reuse them
     l  speed up query streams with commonalities
     l  low cost and self-organization
                       NLeSC 9 Nov 2011
“An architecture for recycling
                                            Recycler                                        intermediates in a column-store”.
                                                                                            Ivanova, Kersten, Nes, Goncalves.
                                            fit into MonetDB                                ACM TODS 35(4), Dec. 2010




                                                                                  SQL	
                   XQuery	
  
func6on	
  user.s1_2(A0:date,	
  ...):void;	
  
	
  	
  	
  X5	
  :=	
  sql.bind("sys","lineitem",...);	
  
	
  	
  	
  X10	
  :=	
  algebra.select(X5,A0);	
  	
  
	
  	
  	
  X12	
  :=	
  sql.bindIdx("sys","lineitem",...);	
  	
                               MAL	
  
	
  	
  	
  X15	
  :=	
  algebra.join(X10,X12);	
  	
  
	
  	
  	
  X25	
  :=	
  m6me.addmonths(A1,A2);	
  	
  
	
  	
  	
  ...	
                                                                                             Recycler	
  
                                                                                Tac6cal	
  Op6mizer	
  
                                                                                                              Op6mizer	
  
func6on	
  user.s1_2(A0:date,	
  ...):void;	
  
	
  	
  	
  X5	
  :=	
  sql.bind("sys","lineitem",...);	
                                       MAL	
  
	
  	
  	
  X10	
  :=	
  algebra.select(X5,A0);	
  	
  
	
  	
  	
  X12	
  :=	
  sql.bindIdx("sys","lineitem",...);	
  	
                                            Run-­‐6me	
  Support	
  
	
  	
  	
  X15	
  :=	
  algebra.join(X10,X12);	
  	
                             MonetDB	
  Kernel	
  
	
  	
  	
  X25	
  :=	
  m6me.addmonths(A1,A2);	
  	
  
	
  	
  	
  ...	
                                                                                                         Admission	
  &	
  Evic6on	
  

                                                                              MonetDB	
             Recycle	
  Pool	
  
                                                                              Server	
  




                                                                      NLeSC 9 Nov 2011
“An architecture for recycling
              Recycler                            intermediates in a column-store”.
                                                  Ivanova, Kersten, Nes, Goncalves.
              instruction matching                ACM TODS 35(4), Dec. 2010



Run time comparison of
l    instruction types
l    argument values
                                 Y3	
  :=	
  sql.bind("sys","orders","o_orderdate",0);	
  


             Exact	
  	
        X1	
  :=	
  sql.bind("sys","orders","o_orderdate",0);	
  
                                …	
  
             matching	
  
                                        Name    Value          Data type              Size

                                        X1      10             :bat[:oid,:date]

                                        T1      “sys”          :str

                                        T2      “orders”       :str

                                        …




                          NLeSC 9 Nov 2011
“An architecture for recycling
     Recycler                            intermediates in a column-store”.
                                         Ivanova, Kersten, Nes, Goncalves.
     instruction subsumption             ACM TODS 35(4), Dec. 2010




Y3	
  :=	
  algebra.select(X1,20,45);	
  



X3	
  :=	
  algebra.select(X1,10,80);	
  
…	
  
X5	
   :=	
  algebra.select(X1,20,60);	
  
X5	
  
 Name      Value     Data type               Size
 X1        10        :bat[:oid,:int]         2000
 X3        130       :bat[:oid,:int]          700
 X5        150       :bat[:oid,:int]          350
 …




                 NLeSC 9 Nov 2011
“An architecture for recycling
                 Recycler                    intermediates in a column-store”.
                                             Ivanova, Kersten, Nes, Goncalves.
                 SkyServer evaluation        ACM TODS 35(4), Dec. 2010



Sloan Digital Sky Survey / SkyServer
http://cas.sdss.org
l    100 GB subset of DR4
l    100-query batch from January
      2008 log
l    1.5GB intermediates, 99% reuse
l    Join intermediates major
      consumer of memory and major
      contributor to savings




                          NLeSC 9 Nov 2011
Agenda

A crash course on column-stores


Column stores for science applications


The SciQL array query language




              NLeSC 9 Nov 2011
What is an array?
An array is a systematic arrangement of objects
 addressed by dimension values.
    Get(A, X, Y,…) => Value
    Set(A, X, Y,…) <= Value


There are many species:
 vector, bit array, dynamic array, parallel array,
 sparse array, variable length array, jagged array




              NLeSC 9 Nov 2011
Who needs them anyway ?
Seismology           – partial time-series
Climate simulation – temporal ordered grid
Astronomy            – temporal ordered images
Remote sensing       – image processing
Social networks      – graph algorithms
Genomics             – ordered strings


Scientists ‘love them’ :
  MSEED, NETCDF, FITS, CSV,..
               NLeSC 9 Nov 2011
Arrays in DBMS
Relational prototype built on arrays, Peterlee
 IS(1975)


Persistent programming languages, Astral (1980),
 Plain (1980)


Object-orientation and persistent languages were
 the make belief to handle them, O2(1992)




               NLeSC 9 Nov 2011
PostgreSQL 8.3
Array declarations:
CREATE TABLE sal_emp ( name text, pay_by_quarter integer[], schedule text[][]);
CREATE TABLE tictactoe ( squares integer[3][3] );



Array operations: denotation ([]), contains (@>), is
  contained in (<@), append, concat (||),
  dimension, lower, upper, prepend, to-string, from-
  string


Array constraints: none, no enforcement of
  dimensions.
                      NLeSC 9 Nov 2011
Mysql
From the MySQL forum May 2010:
“
>How to store multiple values in a single field? Is there any array
data type concept in mysql?
>
As Jörg said "Multiple values in a single field" would be an explicit
violation of the relational model..."
“
Is there any experience beyond encoding it as blobs?




                    NLeSC 9 Nov 2011
Rasdaman
Breaks large C++ arrays (rasters) into disjoint
  chunks


Maps chunks into large binary objects (blob)


Provide function interface to access them


RASCAL, a SQL92 extension


Known to work up to 12 TBs.
               NLeSC 9 Nov 2011
SciDB
Breaks large C++ arrays (rasters) into overlapping
  chunks


Built storage manager from scratch


Map-reduce processing model


Provide function interface to access them


AQL, a crippled SQL92
              NLeSC 9 Nov 2011
What is the problem?
-  Appropriate array denotations?
-  Functional complete operation set ?
-  Scale ?
-  Size limitations due to (blob) representations ?
-  Community awareness?




                NLeSC 9 Nov 2011
MonetDB SciQL

SciQL (pronounced ‘cycle’ )
•  A backward compatible extension of SQL’03
•  Symbiosis of relational and array paradigm
•  Flexible structure-based grouping
•  Capitalizes the MonetDB physical array storage
  •  Recycling, an adaptive ‘materialized view’
  •  Zero-cost attachment contract for cooperative clients
                http://www.cwi.nl/~mk/SciQL.pdf



                NLeSC 9 Nov 2011
Table vs arrays

CREATE TABLE tmp
A collection of tuples


Indexed by a (primary) key


Default handling


Explicitly created using
  INS/UPD/DEL



                    NLeSC 9 Nov 2011
Table vs arrays

CREATE TABLE tmp               CREATE ARRAY tmp
A collection of tuples         A collection of a priori defined tuples


Indexed by a (primary) key     Indexed by dimension expressions


Default handling               Implicitly defined by default value,


Explicitly created using       To be updated with INS/DEL/UPD
  INS/UPD/DEL



                    NLeSC 9 Nov 2011
SciQL examples
CREATE TABLE matrix (
  x integer,
  y integer,
  value float
PRIMARY KEY (x,y) );


INSERT INTO matrix VALUES
(0,0,0),(0,1,0),(1,1,0)(1,0,0);

         0      0   0
         0      1   0
         1      1   0
         1      0   0
                        NLeSC 9 Nov 2011
SciQL examples
CREATE TABLE matrix (                 CREATE ARRAY matrix (
  x integer,                               x integer DIMENSION[2],
  y integer,                               y integer DIMENSION[2],
  value float                              value float DEFAULT 0);
PRIMARY KEY (x,y) );


INSERT INTO matrix VALUES
                                                  null   …      …      …
(0,0,0),(0,1,0),(1,1,0)(1,0,0);
                                                  null   null   null   …
         0      0   0                              0      0
                                                          0     null   …
                                              1
         0      1   0                         0    0      0
                                                          0     null   null
         1      1   0                              0      1
         1      0   0
                        NLeSC 9 Nov 2011
SciQL examples
CREATE TABLE matrix (                 CREATE ARRAY matrix (
  x integer,                               x integer DIMENSION[2],
  y integer,                               y integer DIMENSION[2],
  value float                              value float DEFAULT 0);
PRIMARY KEY (x,y) );


DELETE matrix WHERE y=1               DELETE matrix WHERE y=1
                                      A hole in the array

        0       0   0
                                                            null   null
                                                     1
        1       0   0
                                                     0       0      0
                                                             0      1

                        NLeSC 9 Nov 2011
SciQL examples
CREATE TABLE matrix (                 CREATE ARRAY matrix (
  x integer,                               x integer DIMENSION[2],
  y integer,                               y integer DIMENSION[2],
  value float                              value float DEFAULT 0);
PRIMARY KEY (x,y) );


INSERT INTO matrix VALUES             INSERT INTO matrix VALUES
(0,1,1), (1,1,2)                      (0,1,1), (1,1,2)
         0      0   0
                                                         1   2
                                                   1
         1      0   0
                                                   0     0   0
         0      1   1
                                                         0   1
         1      1   2
                        NLeSC 9 Nov 2011
SciQL unbounded arrays
CREATE TABLE matrix (                 CREATE ARRAY matrix (
  x integer,                               x integer DIMENSION,
  y integer,                               y integer DIMENSION,
  value float                              value float DEFAULT 0);
PRIMARY KEY (x,y) );


INSERT INTO matrix VALUES             INSERT INTO matrix VALUES
(0,2,1), (0,1,2)                      (0,2,1), (0,1,2)

         0      2   1                              2     1   0

         0      1   2                              1     0   0
                                                   0     0   2
                                                         0   1
                        NLeSC 9 Nov 2011
SciQL Dimensions
Unbounded Dimensions
  scalar-type DIMENSION


Bounded Dimensions
  scalar-type DIMENSION[stop]
  scalar-type DIMENSION[first: step: stop]
  scalar-type DIMENSION[*: *: *]

timestamp DIMENSION [ timestamp ‘2010-01-19’ : *: timestamp ‘1’
   minute]

                  NLeSC 9 Nov 2011
SciQL table queries

CREATE ARRAY matrix (
  x integer DIMENSION,
  y integer DIMENSION,
  value float DEFAULT 0 );


-- simple checker boarding aggregation
SELECT sum(value) FROM matrix WHERE (x + y) % 2 = 0




                  NLeSC 9 Nov 2011
SciQL array queries

CREATE ARRAY matrix (
  x integer DIMENSION,
  y integer DIMENSION,
  value float DEFAULT 0 );


-- group based aggregation to construct an unbounded vector
SELECT [x], sum(value) FROM matrix
  WHERE (x + y) % 2 = 0
  GROUP BY x;

                  NLeSC 9 Nov 2011
SciQL array views
CREATE ARRAY vmatrix (
  x integer DIMENSION[-1:5],
  y integer DIMENSION[-1:5],
  value float DEFAULT -1 )
AS SELECT x, y, value FROM matrix;

               -1       -1      -1     -1
               -1        0      0      -1
               -1        0      0      -1
               -1       -1      -1     -1




                    NLeSC 9 Nov 2011
SciQL tiling examples
              V0,3    V1,3     V2,3   V3,3


              V0,2    V1,2     V2,2   V3,2


              V0,1    V1,1     V2,1   V3,1

Anchor
Point         V0,0    V1,0     V2,0   V3,0




         SELECT x, y, avg(value)
         FROM matrix
         GROUP BY matrix[x:1:x+2][y:1:y+2];


                 NLeSC 9 Nov 2011
SciQL tiling examples
                 V0,3    V1,3     V2,3   V3,3


                 V0,2    V1,2     V2,2   V3,2


                 V0,1    V1,1     V2,1   V3,1

Anchor
Point            V0,0    V1,0     V2,0   V3,0




         SELECT x, y, avg(value)
         FROM matrix
         GROUP BY DISTINCT matrix[x:1:x+2][y:1:y+2];


                    NLeSC 9 Nov 2011
SciQL tiling examples
          V0,3    V1,3     V2,3   V3,3

Anchor
Point     V0,2    V1,2     V2,2   V3,2


          V0,1    V1,1     V2,1   V3,1
   null

          V0,0    V1,0     V2,0   V3,0
   null                                  null



SELECT x, y, avg(value)
FROM matrix
GROUP BY DISTINCT matrix[x-1:1:x+1][y:1:y+2];


             NLeSC 9 Nov 2011
SciQL tiling examples
               V0,3    V1,3     V2,3   V3,3

Anchor
Point          V0,2    V1,2     V2,2   V3,2


               V0,1    V1,1     V2,1   V3,1


               V0,0    V1,0     V2,0   V3,0




         SELECT x, y, avg(value)
         FROM matrix
         GROUP BY matrix[x][y],
          matrix[x-1][y], matrix[x+1][y],
          matrix[x][y-1], matrix[x][y+1];
                  NLeSC 9 Nov 2011
SciQL, A Query Language for Science Applications


•  Seamless integration of array-, set-, and sequence-
   semantics.
•  Dimension constraints as a declarative means for
   indexed access to array cells.
•  Structural grouping to generalize the value-based
   grouping towards selective access to groups of cells
   based on positional relationships for aggregation.




                 NLeSC 9 Nov 2011
Seismology use case
Rietbrock: Chili earthquake
  … 2TB of wave fronts
  … filter by sta/lta
  … remove false positives
  … window-based 3 min cuts
  … heuristic tests
  … interactive response required …


How can a database system help?
  Scanning 2TB on modern pc takes >3 hours

                NLeSC 9 Nov 2011
Use case, a SciQL dream
Rietbrock: Chili earthquake
create array mseed (
 tick   timestamp dimension[timestamp ‘2010’:*],
 data decimal(8,6),
 station string );




               NLeSC 9 Nov 2011
Use case, a SciQL dream
Rietbrock: … filter by sta/lta


--- average by window of 5 seconds
select A.tick, avg(A.data)
from mseed A
group by A[tick:1:tick + 5 seconds]




               NLeSC 9 Nov 2011
Use case, a SciQL dream
Rietbrock: … filter by sta/lta
select A.tick
from mseed A, mseed B
where A.tick = B.tick
and avg(A.data) / avg(B.data) > delta
group by A[tick:tick + 5 seconds],
  B[tick:tick + 15 seconds]




                NLeSC 9 Nov 2011
Use case, a SciQL dream
Rietbrock: … filter by sta/lta
create view candidates(
  station string,
  tick timestamp,
  ratio float ) as
select A.station, A.tick, avg(A.data) / avg(B.data) as ratio
  from mseed A, mseed B
  where A.tick = B.tick
  and avg(A.data) / avg(B.data) > delta
  group by A[tick:tick + 5 seconds],
   B[tick:tick + 15 seconds]

                    NLeSC 9 Nov 2011
Use case, a SciQL dream
Rietbrock: … remove false positives
-- remove isolated errors by direct environment
-- using wave propagation statics


create table neighbors(
 head string,
 tail string,
 delay timestamp,
 weight float)

                 NLeSC 9 Nov 2011
Use case, a SciQL dream
Rietbrock: … remove false positives
select A.tick, B.tick
  from candidates A, candidates B, neighbors N
 where A.station = N.head
 and B.station = N.tail
 and B.tick = A.tick + N.delay
 and B.ratio * N.weight < A.ratio;




              NLeSC 9 Nov 2011
Use case, a SciQL dream
Rietbrock: … remove false positives
delete from candidates
 select A.tick
 from candidates A, candidates B, neighbors N
 where A.station = N.head
 and B.station = N.tail
 and B.tick = A.tick + N.delay
 and B.ratio * N.weight < A.ratio;



              NLeSC 9 Nov 2011
Use case, a SciQL dream
Rietbrock: … window-based 3 min cuts
  … heuristic tests


select B.station, myfunction(B.data)
  from candidates A, mseed B
 where A.tick = B.tick
 group by distinct B[tick:tick + 3 minutes];


-- using a User Defined Function written in C.


                NLeSC 9 Nov 2011
Use case
Rietbrock: … interactive response required …




The query over 2TB of seismic data will be
 handled before he finishes his coffee.

              NLeSC 9 Nov 2011
Status
•  The language definition is ‘finished’
•  The grammar is included in SQL parser
•  Semantic checks added to SQL parser
•  A test suite is being built
•  Runtime support features and software stack
•  …
•  Exposure to real life cases and external libraries




                NLeSC 9 Nov 2011
NLeSC 9 Nov 2011
NLeSC 9 Nov 2011
Science DBMS landscape
                    MonetDB 5.23                  SciDB 0.5              Rasdaman
Architecture        Server approach               Server approach        Plugin(Oracle, DB2, Informix,
                                                                         Mysql, Postgresql)
Open source         Mozilla License               GPL 3.0 Commercial     GPL 3.0 Dual license
Downloads           >12.000 /month                Tens up to now         ??
SQL                 SQL 2003                      ??                     SQL92++
Interoperability    {JO}DBC, C(++),Python, …      C++ UDF                C++, Java, OGC
Array language      SciQL                         AQL                    RASQL
Array model         Fixed+variable bounds         Fixed arrays           Fixed+variable bounds
Science             Linked libraries              Linked libraries       Linked libraries
Foreign files       Vaults of csv, FITS,          ??                     Tiff,png,jpg..,
                    NETCDF, MSEED                                        csv,,NETCDF,HDF4,
Distribution        50-200 node cluster           4 node cluster         20-node
Distribution tech   Dynamic partial replication   Static fragmentation   Static fragmentation
Executor            Various schemes               Map-reduce             Tile streaming
Largest demo        Skyserver SDSS 6 3TB          ---                    12TB, IGN –F (on Postgresql)
Storage tuning      Query adaptive                Schema definitions     Workload driven
Optimization        Heuristics + cost base        ??                     Heuristics +cost based
                                 NLeSC 9 Nov 2011

More Related Content

PDF
Martin Odersky: What's next for Scala
PPTX
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
PDF
Apache Spark: What's under the hood
PPTX
20130912 YTC_Reynold Xin_Spark and Shark
PDF
JCConf 2018 - Retrospect and Prospect of Java
PDF
Apache Spark: The Analytics Operating System
PDF
Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...
PDF
From Java code to Java heap: Understanding and optimizing your application's ...
Martin Odersky: What's next for Scala
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Apache Spark: What's under the hood
20130912 YTC_Reynold Xin_Spark and Shark
JCConf 2018 - Retrospect and Prospect of Java
Apache Spark: The Analytics Operating System
Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...
From Java code to Java heap: Understanding and optimizing your application's ...

What's hot (7)

PDF
Map Reduce data types and formats
PPTX
Oracle Database 11g Release 2 - XMLDB New Features
PDF
Resilient Distributed Datasets
PPT
Spark training-in-bangalore
PPT
BDAS Shark study report 03 v1.1
PPTX
Spark Sql and DataFrame
PDF
Analytical Queries with Hive: SQL Windowing and Table Functions
Map Reduce data types and formats
Oracle Database 11g Release 2 - XMLDB New Features
Resilient Distributed Datasets
Spark training-in-bangalore
BDAS Shark study report 03 v1.1
Spark Sql and DataFrame
Analytical Queries with Hive: SQL Windowing and Table Functions
Ad

Viewers also liked (8)

PPTX
Stockage des données dans les sgbd
PDF
Magic quadrant for data warehouse database management systems
PDF
Cours HBase et Base de Données Orientées Colonnes (HBase, Column Oriented Dat...
PDF
Intro to column stores
PPTX
Comparison of MPP Data Warehouse Platforms
PPTX
Agile Business Intelligence
PPT
Teradata vs-exadata
PDF
Netezza vs Teradata vs Exadata
Stockage des données dans les sgbd
Magic quadrant for data warehouse database management systems
Cours HBase et Base de Données Orientées Colonnes (HBase, Column Oriented Dat...
Intro to column stores
Comparison of MPP Data Warehouse Platforms
Agile Business Intelligence
Teradata vs-exadata
Netezza vs Teradata vs Exadata
Ad

Similar to Arrays in database systems, the next frontier? (20)

PPTX
KEY
NOSQL, CouchDB, and the Cloud
PDF
Microsoft Big Data @ SQLUG 2013
KEY
DevNation Atlanta
PPTX
PhillyDB Talk - Beyond Batch
PDF
The Synergy Between the Object Database, Graph Database, Cloud Computing and ...
PPTX
Drill njhug -19 feb2013
PDF
Sqlmr
PDF
Cjoin
PDF
OLAP
PDF
Rise of the scientific database
PPTX
History of database processing module 1 (2)
PDF
Relational
PDF
NoSQL with MySQL
PPTX
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
PPTX
Revision
PDF
What Does Big Data Mean and Who Will Win
PDF
high_level_parallel_processing_model
PPTX
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
PPTX
An Introduction to Big Data, NoSQL and MongoDB
NOSQL, CouchDB, and the Cloud
Microsoft Big Data @ SQLUG 2013
DevNation Atlanta
PhillyDB Talk - Beyond Batch
The Synergy Between the Object Database, Graph Database, Cloud Computing and ...
Drill njhug -19 feb2013
Sqlmr
Cjoin
OLAP
Rise of the scientific database
History of database processing module 1 (2)
Relational
NoSQL with MySQL
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
Revision
What Does Big Data Mean and Who Will Win
high_level_parallel_processing_model
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
An Introduction to Big Data, NoSQL and MongoDB

More from PlanetData Network of Excellence (20)

PDF
A Contextualized Knowledge Repository for Open Data about Trentino
PDF
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
PDF
Towards Enabling Probabilistic Databases for Participatory Sensing
PDF
Privacy-Preserving Schema Reuse
PDF
Pay-as-you-go Reconciliation in Schema Matching Networks
PPTX
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
PPT
On the need for a W3C community group on RDF Stream Processing
PDF
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
PDF
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
PDF
SciQL, Bridging the Gap between Science and Relational DBMS
PPT
CLODA: A Crowdsourced Linked Open Data Architecture
PDF
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
PPT
Data and Knowledge Evolution
PPS
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
PPS
Access Control for RDF graphs using Abstract Models
PDF
Arrays in Databases, the next frontier?
PPS
Abstract Access Control Model for Dynamic RDF Datasets
PPTX
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
PDF
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
A Contextualized Knowledge Repository for Open Data about Trentino
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
Towards Enabling Probabilistic Databases for Participatory Sensing
Privacy-Preserving Schema Reuse
Pay-as-you-go Reconciliation in Schema Matching Networks
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
On the need for a W3C community group on RDF Stream Processing
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
SciQL, Bridging the Gap between Science and Relational DBMS
CLODA: A Crowdsourced Linked Open Data Architecture
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Data and Knowledge Evolution
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Access Control for RDF graphs using Abstract Models
Arrays in Databases, the next frontier?
Abstract Access Control Model for Dynamic RDF Datasets
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...

Recently uploaded (20)

PDF
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
PDF
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
PDF
The AI Revolution in Customer Service - 2025
PDF
Connector Corner: Transform Unstructured Documents with Agentic Automation
PPTX
Presentation - Principles of Instructional Design.pptx
PDF
Introduction to MCP and A2A Protocols: Enabling Agent Communication
PDF
Auditboard EB SOX Playbook 2023 edition.
PDF
Data Virtualization in Action: Scaling APIs and Apps with FME
PDF
4 layer Arch & Reference Arch of IoT.pdf
PPTX
SGT Report The Beast Plan and Cyberphysical Systems of Control
PDF
Examining Bias in AI Generated News Content.pdf
PDF
Build Real-Time ML Apps with Python, Feast & NoSQL
PDF
NewMind AI Weekly Chronicles – August ’25 Week IV
PPTX
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
PDF
Electrocardiogram sequences data analytics and classification using unsupervi...
PDF
Lung cancer patients survival prediction using outlier detection and optimize...
PDF
ment.tech-Siri Delay Opens AI Startup Opportunity in 2025.pdf
PDF
Early detection and classification of bone marrow changes in lumbar vertebrae...
PPTX
Module 1 Introduction to Web Programming .pptx
PPTX
Build automations faster and more reliably with UiPath ScreenPlay
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
The AI Revolution in Customer Service - 2025
Connector Corner: Transform Unstructured Documents with Agentic Automation
Presentation - Principles of Instructional Design.pptx
Introduction to MCP and A2A Protocols: Enabling Agent Communication
Auditboard EB SOX Playbook 2023 edition.
Data Virtualization in Action: Scaling APIs and Apps with FME
4 layer Arch & Reference Arch of IoT.pdf
SGT Report The Beast Plan and Cyberphysical Systems of Control
Examining Bias in AI Generated News Content.pdf
Build Real-Time ML Apps with Python, Feast & NoSQL
NewMind AI Weekly Chronicles – August ’25 Week IV
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
Electrocardiogram sequences data analytics and classification using unsupervi...
Lung cancer patients survival prediction using outlier detection and optimize...
ment.tech-Siri Delay Opens AI Startup Opportunity in 2025.pdf
Early detection and classification of bone marrow changes in lumbar vertebrae...
Module 1 Introduction to Web Programming .pptx
Build automations faster and more reliably with UiPath ScreenPlay

Arrays in database systems, the next frontier?

  • 1. Arrays in database systems, the next frontier ? Martin Kersten CWI NLeSC 9 Nov 2011
  • 2. “We can't solve problems by using the same kind of thinking we used when we created them.” NLeSC 9 Nov 2011
  • 3. Agenda A crash course on column-stores Column stores for science applications The SciQL array query language NLeSC 9 Nov 2011
  • 4. The world of column stores Motivation Relational DBMSs dominate since the late 1970's / 1980's l  Transactional workloads (OLTP, row-wise access) l  I/O based processing l  Ingres, Postgresql, MySQL, Oracle, SQLserver, DB2, … Column stores dominate product development since 2005 l  Datawarehouses and business intelligence applications l  Startups: Infobright, Aster Data, Greenplum, LucidDB,.. l  Commercial: Microsoft, IBM, SAP,… MonetDB, the pioneer NLeSC 9 Nov 2011
  • 5. The world of column stores Workload changes: Transactions (OLTP) vs ...‫‏‬ NLeSC 9 Nov 2011
  • 6. The world of column stores Workload changes: ... vs OLAP, BI, Data Mining, ... NLeSC 9 Nov 2011
  • 7. The world of column stores Databases hit The Memory Wall §  Detailed and exhaustive analysis for different workloads using 4 RDBMSs by Ailamaki, DeWitt, Hill,, Wood in VLDB 1999: “DBMSs On A Modern Processor: Where Does Time Go?”‫‏‬ §  CPU is 60%-90% idle, waiting for memory: §  L1 data stalls §  L1 instruction stalls §  L2 data stalls §  TLB stalls §  Branch mispredictions §  Resource stalls NLeSC 9 Nov 2011
  • 8. The world of column stores Hardware Changes: The Memory Wall Trip to memory = 1000s of instructions! NLeSC 9 Nov 2011
  • 9. Storing Relations in MonetDB Void Void Void Void Void 1000 1000 1000 1000 1000 . . . . . . . . . . . . . . . . . . . . . . . . . Virtual OID: seqbase=1000 (increment=1) NLeSC 9 Nov 2011
  • 10. BAT Data Structure BAT: binary association table Head Tail BUN: binary unit Hash tables, Head & Tail: BUN heap: T-trees, - consecutive memory R-trees, blocks (arrays)‫‏‬ block (array)‫‏‬ ... - memory-mapped file files Tail Heap: - best-effort duplicate elimination for strings (~ dictionary encoding) NLeSC 9 Nov 2011
  • 11. MonetDB Front-end: SQL l  SQL 2003 l  Parse SQL into logical n-ary relational algebra tree l  Translate n-ary relational algebra into logical 2-ary relational algebra l  Turn logical 2-ary plan into physical 2-ary plan (MAL program) l  Front-end specific strategic optimization: l  Heuristic optimization during all three previous steps l  Primary key and distinct constraints: l  Create and maintain hash indices l  Foreign key constraints l  Create and maintain foreign key join indices NLeSC 9 Nov 2011
  • 12. MonetDB Front-end: SQL EXPLAIN SELECT a, z FROM t, s WHERE t.c = s.x; function user.s2_1():void; barrier _73 := language.dataflow(); _2:bat[:oid,:int] := sql.bind("sys","t","c",0); _7:bat[:oid,:int] := sql.bind("sys","s","x",0); _10 := bat.reverse(_7); _11 := algebra.join(_2,_10); _13 := algebra.markT(_11,0@0); _14 := bat.reverse(_13); _15:bat[:oid,:int] := sql.bind("sys","t","a",0); _17 := algebra.leftjoin(_14,_15); _18 := bat.reverse(_11); _19 := algebra.markT(_18,0@0); _20 := bat.reverse(_19); _21:bat[:oid,:int] := sql.bind("sys","s","z",0); _23 := algebra.leftjoin(_20,_21); exit _73; _24 := sql.resultSet(2,1,_17); sql.rsColumn(_24,"sys.t","a","int",32,0,_17); sql.rsColumn(_24,"sys.s","z","int",32,0,_23); _33 := io.stdout(); sql.exportResult(_33,_24); end s2_1; NLeSC 9 Nov 2011
  • 13. MonetDB/5 Back-end: MAL l  MAL: Monet Assembly Language l  textual interface l  Interpreted language l  Designed as system interface language l  Reduced, concise syntax l  Strict typing l  Meant for automatic generation and parsing/rewriting/processing l  Not meant to be typed by humans l  Efficient parser l  Low overhead l  Inherent support for tactical optimization: MAL -> MAL l  Support for optimizer plug-ins l  Support for runtime schedulers l  Binary-algebra core l  Flow control (MAL is computational complete)‫‏‬ NLeSC 9 Nov 2011
  • 14. Processing Model (MonetDB Kernel)‫‏‬ l  Bulk processing: l  full materialization of all intermediate results l  Binary (i.e., 2-column) algebra core: l  select, join, semijoin, outerjoin l  union, intersection, diff (BAT-wise & column-wise)‫‏‬ l  group, count, max, min, sum, avg l  reverse, mirror, mark l  Runtime operational optimization: l  Choosing optimal algorithm & implementation according to input properties and system status NLeSC 9 Nov 2011
  • 15. Processing Model (MonetDB Kernel)‫‏‬ l  Heavy use of code expansion to reduce cost 1 algebra operator select()‫‏‬ 3 overloaded operators select(“=“,value) select(“between”,L,H) select(“fcn”,parm)‫‏‬ 10 operator algorithms scan hash-lookup bin-search bin-tree pos-lookup scan_range_select_oid_int(), ~1500(!) routines hash_equi_select_void_str(), … (macro expansion)‫‏‬ •  ~1500 selection routines •  149 unary operations •  335 join/group operations •  ... NLeSC 9 Nov 2011
  • 16. The Software Stack Front-ends XQuery SQL 03 SciQL RDF Optimizers Back-end(s) MonetDB 4 MonetDB 5 Kernel MonetDB kernel NLeSC 9 Nov 2011
  • 17. The Software Stack Strategic optimization Front-ends XQuery SQL 03 MAL Optimizers Tactical optimization: MAL -> MAL rewrites Back-end(s) MonetDB 4 MonetDB 5 MAL Runtime Kernel MonetDB kernel operational optimization NLeSC 9 Nov 2011
  • 18. MonetDB vs Traditional DBMS Architecture l  Architecture-Conscious Query Processing vs Magnetic disk I/O conscious processing -  l  Data layout, algorithms, cost models l  RISC Relational Algebra (operator-at-a-time) - vs Tuple-at-a-time Iterator Model l  Faster through simplicity: no tuple expression interpreter l  Multi-Model: ODMG, SQL, XML/XQuery, ..., RDF/SPARQL vs Relational with Bolt-on Subsystems -  l  Columns as the building block for complex data structures l  Decoupling of Transactions from Execution/Buffering vs ARIES integrated into Execution/Buffering/Indexing -  l  ACID, but not ARIES.. Pay as you need transaction overhead. l  Run-Time Indexing and Query Optimization - vs Static DBA/Workload-driven Optimization & Indexing l  Extensible Optimizer Framework; l  cracking, recycling, sampling-based runtime optimization NLeSC 9 Nov 2011
  • 19. Evolution It is not the strongest of the species that survives, nor the most intelligent, but the one most responsive to change. Charles Darwin (1809 - 1882) NLeSC 9 Nov 2011
  • 20. Agenda A crash course on column-stores Column stores for science applications The SciQL array query language NLeSC 9 Nov 2011
  • 21. NLeSC 9 Nov 2011
  • 22. SkyServer Schema 446  columns   >585  million  rows   6  columns   >  20  Billion  rows   NLeSC 9 Nov 2011
  • 23. “An architecture for recycling Recycler intermediates in a column-store”. Ivanova, Kersten, Nes, Goncalves. motivation & idea ACM TODS 35(4), Dec. 2010 Motivation: l  scientific databases, data analytics l  Terabytes of data (observational , transactional) l  Prevailing read-only workload l  Ad-hoc queries with commonalities Background: l  Operator-at-a-time execution paradigm Ø  Automatic materialization of intermediates l  Canonical column-store organization Ø  Intermediates have reduced dimensionality and finer granularity Ø  Simplified overlap analysis Recycling idea: l  instead of garbage collecting, l  keep the intermediates and reuse them l  speed up query streams with commonalities l  low cost and self-organization NLeSC 9 Nov 2011
  • 24. “An architecture for recycling Recycler intermediates in a column-store”. Ivanova, Kersten, Nes, Goncalves. fit into MonetDB ACM TODS 35(4), Dec. 2010 SQL   XQuery   func6on  user.s1_2(A0:date,  ...):void;        X5  :=  sql.bind("sys","lineitem",...);        X10  :=  algebra.select(X5,A0);          X12  :=  sql.bindIdx("sys","lineitem",...);     MAL        X15  :=  algebra.join(X10,X12);          X25  :=  m6me.addmonths(A1,A2);          ...   Recycler   Tac6cal  Op6mizer   Op6mizer   func6on  user.s1_2(A0:date,  ...):void;        X5  :=  sql.bind("sys","lineitem",...);   MAL        X10  :=  algebra.select(X5,A0);          X12  :=  sql.bindIdx("sys","lineitem",...);     Run-­‐6me  Support        X15  :=  algebra.join(X10,X12);     MonetDB  Kernel        X25  :=  m6me.addmonths(A1,A2);          ...   Admission  &  Evic6on   MonetDB   Recycle  Pool   Server   NLeSC 9 Nov 2011
  • 25. “An architecture for recycling Recycler intermediates in a column-store”. Ivanova, Kersten, Nes, Goncalves. instruction matching ACM TODS 35(4), Dec. 2010 Run time comparison of l  instruction types l  argument values Y3  :=  sql.bind("sys","orders","o_orderdate",0);   Exact     X1  :=  sql.bind("sys","orders","o_orderdate",0);   …   matching   Name Value Data type Size X1 10 :bat[:oid,:date] T1 “sys” :str T2 “orders” :str … NLeSC 9 Nov 2011
  • 26. “An architecture for recycling Recycler intermediates in a column-store”. Ivanova, Kersten, Nes, Goncalves. instruction subsumption ACM TODS 35(4), Dec. 2010 Y3  :=  algebra.select(X1,20,45);   X3  :=  algebra.select(X1,10,80);   …   X5   :=  algebra.select(X1,20,60);   X5   Name Value Data type Size X1 10 :bat[:oid,:int] 2000 X3 130 :bat[:oid,:int] 700 X5 150 :bat[:oid,:int] 350 … NLeSC 9 Nov 2011
  • 27. “An architecture for recycling Recycler intermediates in a column-store”. Ivanova, Kersten, Nes, Goncalves. SkyServer evaluation ACM TODS 35(4), Dec. 2010 Sloan Digital Sky Survey / SkyServer http://cas.sdss.org l  100 GB subset of DR4 l  100-query batch from January 2008 log l  1.5GB intermediates, 99% reuse l  Join intermediates major consumer of memory and major contributor to savings NLeSC 9 Nov 2011
  • 28. Agenda A crash course on column-stores Column stores for science applications The SciQL array query language NLeSC 9 Nov 2011
  • 29. What is an array? An array is a systematic arrangement of objects addressed by dimension values. Get(A, X, Y,…) => Value Set(A, X, Y,…) <= Value There are many species: vector, bit array, dynamic array, parallel array, sparse array, variable length array, jagged array NLeSC 9 Nov 2011
  • 30. Who needs them anyway ? Seismology – partial time-series Climate simulation – temporal ordered grid Astronomy – temporal ordered images Remote sensing – image processing Social networks – graph algorithms Genomics – ordered strings Scientists ‘love them’ : MSEED, NETCDF, FITS, CSV,.. NLeSC 9 Nov 2011
  • 31. Arrays in DBMS Relational prototype built on arrays, Peterlee IS(1975) Persistent programming languages, Astral (1980), Plain (1980) Object-orientation and persistent languages were the make belief to handle them, O2(1992) NLeSC 9 Nov 2011
  • 32. PostgreSQL 8.3 Array declarations: CREATE TABLE sal_emp ( name text, pay_by_quarter integer[], schedule text[][]); CREATE TABLE tictactoe ( squares integer[3][3] ); Array operations: denotation ([]), contains (@>), is contained in (<@), append, concat (||), dimension, lower, upper, prepend, to-string, from- string Array constraints: none, no enforcement of dimensions. NLeSC 9 Nov 2011
  • 33. Mysql From the MySQL forum May 2010: “ >How to store multiple values in a single field? Is there any array data type concept in mysql? > As Jörg said "Multiple values in a single field" would be an explicit violation of the relational model..." “ Is there any experience beyond encoding it as blobs? NLeSC 9 Nov 2011
  • 34. Rasdaman Breaks large C++ arrays (rasters) into disjoint chunks Maps chunks into large binary objects (blob) Provide function interface to access them RASCAL, a SQL92 extension Known to work up to 12 TBs. NLeSC 9 Nov 2011
  • 35. SciDB Breaks large C++ arrays (rasters) into overlapping chunks Built storage manager from scratch Map-reduce processing model Provide function interface to access them AQL, a crippled SQL92 NLeSC 9 Nov 2011
  • 36. What is the problem? -  Appropriate array denotations? -  Functional complete operation set ? -  Scale ? -  Size limitations due to (blob) representations ? -  Community awareness? NLeSC 9 Nov 2011
  • 37. MonetDB SciQL SciQL (pronounced ‘cycle’ ) •  A backward compatible extension of SQL’03 •  Symbiosis of relational and array paradigm •  Flexible structure-based grouping •  Capitalizes the MonetDB physical array storage •  Recycling, an adaptive ‘materialized view’ •  Zero-cost attachment contract for cooperative clients http://www.cwi.nl/~mk/SciQL.pdf NLeSC 9 Nov 2011
  • 38. Table vs arrays CREATE TABLE tmp A collection of tuples Indexed by a (primary) key Default handling Explicitly created using INS/UPD/DEL NLeSC 9 Nov 2011
  • 39. Table vs arrays CREATE TABLE tmp CREATE ARRAY tmp A collection of tuples A collection of a priori defined tuples Indexed by a (primary) key Indexed by dimension expressions Default handling Implicitly defined by default value, Explicitly created using To be updated with INS/DEL/UPD INS/UPD/DEL NLeSC 9 Nov 2011
  • 40. SciQL examples CREATE TABLE matrix ( x integer, y integer, value float PRIMARY KEY (x,y) ); INSERT INTO matrix VALUES (0,0,0),(0,1,0),(1,1,0)(1,0,0); 0 0 0 0 1 0 1 1 0 1 0 0 NLeSC 9 Nov 2011
  • 41. SciQL examples CREATE TABLE matrix ( CREATE ARRAY matrix ( x integer, x integer DIMENSION[2], y integer, y integer DIMENSION[2], value float value float DEFAULT 0); PRIMARY KEY (x,y) ); INSERT INTO matrix VALUES null … … … (0,0,0),(0,1,0),(1,1,0)(1,0,0); null null null … 0 0 0 0 0 0 null … 1 0 1 0 0 0 0 0 null null 1 1 0 0 1 1 0 0 NLeSC 9 Nov 2011
  • 42. SciQL examples CREATE TABLE matrix ( CREATE ARRAY matrix ( x integer, x integer DIMENSION[2], y integer, y integer DIMENSION[2], value float value float DEFAULT 0); PRIMARY KEY (x,y) ); DELETE matrix WHERE y=1 DELETE matrix WHERE y=1 A hole in the array 0 0 0 null null 1 1 0 0 0 0 0 0 1 NLeSC 9 Nov 2011
  • 43. SciQL examples CREATE TABLE matrix ( CREATE ARRAY matrix ( x integer, x integer DIMENSION[2], y integer, y integer DIMENSION[2], value float value float DEFAULT 0); PRIMARY KEY (x,y) ); INSERT INTO matrix VALUES INSERT INTO matrix VALUES (0,1,1), (1,1,2) (0,1,1), (1,1,2) 0 0 0 1 2 1 1 0 0 0 0 0 0 1 1 0 1 1 1 2 NLeSC 9 Nov 2011
  • 44. SciQL unbounded arrays CREATE TABLE matrix ( CREATE ARRAY matrix ( x integer, x integer DIMENSION, y integer, y integer DIMENSION, value float value float DEFAULT 0); PRIMARY KEY (x,y) ); INSERT INTO matrix VALUES INSERT INTO matrix VALUES (0,2,1), (0,1,2) (0,2,1), (0,1,2) 0 2 1 2 1 0 0 1 2 1 0 0 0 0 2 0 1 NLeSC 9 Nov 2011
  • 45. SciQL Dimensions Unbounded Dimensions scalar-type DIMENSION Bounded Dimensions scalar-type DIMENSION[stop] scalar-type DIMENSION[first: step: stop] scalar-type DIMENSION[*: *: *] timestamp DIMENSION [ timestamp ‘2010-01-19’ : *: timestamp ‘1’ minute] NLeSC 9 Nov 2011
  • 46. SciQL table queries CREATE ARRAY matrix ( x integer DIMENSION, y integer DIMENSION, value float DEFAULT 0 ); -- simple checker boarding aggregation SELECT sum(value) FROM matrix WHERE (x + y) % 2 = 0 NLeSC 9 Nov 2011
  • 47. SciQL array queries CREATE ARRAY matrix ( x integer DIMENSION, y integer DIMENSION, value float DEFAULT 0 ); -- group based aggregation to construct an unbounded vector SELECT [x], sum(value) FROM matrix WHERE (x + y) % 2 = 0 GROUP BY x; NLeSC 9 Nov 2011
  • 48. SciQL array views CREATE ARRAY vmatrix ( x integer DIMENSION[-1:5], y integer DIMENSION[-1:5], value float DEFAULT -1 ) AS SELECT x, y, value FROM matrix; -1 -1 -1 -1 -1 0 0 -1 -1 0 0 -1 -1 -1 -1 -1 NLeSC 9 Nov 2011
  • 49. SciQL tiling examples V0,3 V1,3 V2,3 V3,3 V0,2 V1,2 V2,2 V3,2 V0,1 V1,1 V2,1 V3,1 Anchor Point V0,0 V1,0 V2,0 V3,0 SELECT x, y, avg(value) FROM matrix GROUP BY matrix[x:1:x+2][y:1:y+2]; NLeSC 9 Nov 2011
  • 50. SciQL tiling examples V0,3 V1,3 V2,3 V3,3 V0,2 V1,2 V2,2 V3,2 V0,1 V1,1 V2,1 V3,1 Anchor Point V0,0 V1,0 V2,0 V3,0 SELECT x, y, avg(value) FROM matrix GROUP BY DISTINCT matrix[x:1:x+2][y:1:y+2]; NLeSC 9 Nov 2011
  • 51. SciQL tiling examples V0,3 V1,3 V2,3 V3,3 Anchor Point V0,2 V1,2 V2,2 V3,2 V0,1 V1,1 V2,1 V3,1 null V0,0 V1,0 V2,0 V3,0 null null SELECT x, y, avg(value) FROM matrix GROUP BY DISTINCT matrix[x-1:1:x+1][y:1:y+2]; NLeSC 9 Nov 2011
  • 52. SciQL tiling examples V0,3 V1,3 V2,3 V3,3 Anchor Point V0,2 V1,2 V2,2 V3,2 V0,1 V1,1 V2,1 V3,1 V0,0 V1,0 V2,0 V3,0 SELECT x, y, avg(value) FROM matrix GROUP BY matrix[x][y], matrix[x-1][y], matrix[x+1][y], matrix[x][y-1], matrix[x][y+1]; NLeSC 9 Nov 2011
  • 53. SciQL, A Query Language for Science Applications •  Seamless integration of array-, set-, and sequence- semantics. •  Dimension constraints as a declarative means for indexed access to array cells. •  Structural grouping to generalize the value-based grouping towards selective access to groups of cells based on positional relationships for aggregation. NLeSC 9 Nov 2011
  • 54. Seismology use case Rietbrock: Chili earthquake … 2TB of wave fronts … filter by sta/lta … remove false positives … window-based 3 min cuts … heuristic tests … interactive response required … How can a database system help? Scanning 2TB on modern pc takes >3 hours NLeSC 9 Nov 2011
  • 55. Use case, a SciQL dream Rietbrock: Chili earthquake create array mseed ( tick timestamp dimension[timestamp ‘2010’:*], data decimal(8,6), station string ); NLeSC 9 Nov 2011
  • 56. Use case, a SciQL dream Rietbrock: … filter by sta/lta --- average by window of 5 seconds select A.tick, avg(A.data) from mseed A group by A[tick:1:tick + 5 seconds] NLeSC 9 Nov 2011
  • 57. Use case, a SciQL dream Rietbrock: … filter by sta/lta select A.tick from mseed A, mseed B where A.tick = B.tick and avg(A.data) / avg(B.data) > delta group by A[tick:tick + 5 seconds], B[tick:tick + 15 seconds] NLeSC 9 Nov 2011
  • 58. Use case, a SciQL dream Rietbrock: … filter by sta/lta create view candidates( station string, tick timestamp, ratio float ) as select A.station, A.tick, avg(A.data) / avg(B.data) as ratio from mseed A, mseed B where A.tick = B.tick and avg(A.data) / avg(B.data) > delta group by A[tick:tick + 5 seconds], B[tick:tick + 15 seconds] NLeSC 9 Nov 2011
  • 59. Use case, a SciQL dream Rietbrock: … remove false positives -- remove isolated errors by direct environment -- using wave propagation statics create table neighbors( head string, tail string, delay timestamp, weight float) NLeSC 9 Nov 2011
  • 60. Use case, a SciQL dream Rietbrock: … remove false positives select A.tick, B.tick from candidates A, candidates B, neighbors N where A.station = N.head and B.station = N.tail and B.tick = A.tick + N.delay and B.ratio * N.weight < A.ratio; NLeSC 9 Nov 2011
  • 61. Use case, a SciQL dream Rietbrock: … remove false positives delete from candidates select A.tick from candidates A, candidates B, neighbors N where A.station = N.head and B.station = N.tail and B.tick = A.tick + N.delay and B.ratio * N.weight < A.ratio; NLeSC 9 Nov 2011
  • 62. Use case, a SciQL dream Rietbrock: … window-based 3 min cuts … heuristic tests select B.station, myfunction(B.data) from candidates A, mseed B where A.tick = B.tick group by distinct B[tick:tick + 3 minutes]; -- using a User Defined Function written in C. NLeSC 9 Nov 2011
  • 63. Use case Rietbrock: … interactive response required … The query over 2TB of seismic data will be handled before he finishes his coffee. NLeSC 9 Nov 2011
  • 64. Status •  The language definition is ‘finished’ •  The grammar is included in SQL parser •  Semantic checks added to SQL parser •  A test suite is being built •  Runtime support features and software stack •  … •  Exposure to real life cases and external libraries NLeSC 9 Nov 2011
  • 65. NLeSC 9 Nov 2011
  • 66. NLeSC 9 Nov 2011
  • 67. Science DBMS landscape MonetDB 5.23 SciDB 0.5 Rasdaman Architecture Server approach Server approach Plugin(Oracle, DB2, Informix, Mysql, Postgresql) Open source Mozilla License GPL 3.0 Commercial GPL 3.0 Dual license Downloads >12.000 /month Tens up to now ?? SQL SQL 2003 ?? SQL92++ Interoperability {JO}DBC, C(++),Python, … C++ UDF C++, Java, OGC Array language SciQL AQL RASQL Array model Fixed+variable bounds Fixed arrays Fixed+variable bounds Science Linked libraries Linked libraries Linked libraries Foreign files Vaults of csv, FITS, ?? Tiff,png,jpg.., NETCDF, MSEED csv,,NETCDF,HDF4, Distribution 50-200 node cluster 4 node cluster 20-node Distribution tech Dynamic partial replication Static fragmentation Static fragmentation Executor Various schemes Map-reduce Tile streaming Largest demo Skyserver SDSS 6 3TB --- 12TB, IGN –F (on Postgresql) Storage tuning Query adaptive Schema definitions Workload driven Optimization Heuristics + cost base ?? Heuristics +cost based NLeSC 9 Nov 2011