SlideShare a Scribd company logo
HugeTable:Application-Oriented
 Structure Data Storage System

      China Mobile Research Institute
         HugeTable Project Team
               Qian Ling
Agenda

 Motivations
 Hadoop, Hive & HBase
 HT Design & Development
 HT Applications
 Further Plans
Motivations
 Huge Data Volumes
    Total data volumes: Several PB per system
    Daily data volumes: Several TB per system
    Longer retention period: several months
    Big potential: 200% increase in some area
 Multiple Applications Areas                    Data Warehouse
                                                •Scalable
    BOSS BI NMS Internet ...
                                                •High Available
    Data Integration                            •Reliable
 Traditional Application Model                  + App Solution
    SQL support
    Fast Index Query                             … Affordable
    Multiple Application support
    Sensitive data
    CRUD support
    Statistic & Reporting
Hadoop: Raw Techniques

 HDFS: distributed file system with fault tolerance
 MapReduce: parallel programming
 environments over HDFS
 Similar to the situation of POSIX API + Local FS
 High Level Toolkits are initiated
   Yahoo: PIG/Latin
   Business.com: Cloudbase/Hadoop+JDBC
   China Mobile: BC-PDM
   Facebook: Hive/SQL
Hive: A Petabytes Scale Data Warehouse
                             Features:
                              •   Schema support
                              •   Pluggable Storage Engine I/F
                              •   SQL     MR translation
                              •   xDBC Driver
                              •   Tools: HQL Console
                              •   Admin: HWI


                             Usage Scenarios
                              • Reporting
                              • Ad hoc Analysis
                              • Machine Learning
                              • Others
                                  •Log analysis
                                  •Trend detection
                                Facebook has huge clusters
                                >1000 nodes
Source: ICDE 2010/Facebook
HBase: structured storage of sparse data for
 Hadoop
                               Features
                                •   ColumnFamilies
                                •   ACID
                                •   Optimized R/W
                                •   BigTable I/F + BU
                                •   Tools: HBase Shell
                                •   Admin: Jetty Based


                               Usage Scenarios
                                • Social Service
                                • MapReduce Analysis
                                • Content Repository
                                • Wiki, RSS
                                • Near Realtime Reporting
Source: ApacheCon2009/ HBase
                                  & analytics
                                • Store web pages
                               … Replacing SQL Systems
HugeTable: Application-Oriented Structure
Data Storage System
Address the missing blocks                   HugeTable
   Index store & Query Optimizer    Tools
                                             Client   I/F   Admin
                                               s              Data,
   Access Control List              HFile w/      Index
                                                             config,
                                                            FM, Log,
   Insert, Update and Delete          CF          Store       Perf

   Web-based Administration


 Build Solutions for Telco Applications
   Network Management System – NMS
   Value-added System – VAS
   Business Intelligence – BI
   Other areas
A Brief History of HugeTable

       HT-p1                  HT-p2                         HT-p3
                       1. Connect Hive with         1. Move to higher version
1. HBase-based
                          HBase                        of Hive, Hadoop and
2. Partial xDBC/SQL                                    HBase
   support             3. Support HFile, CF in
                                                    2. New Storage Engine
3. Integration HBase      Hive                      3. Fruitful external I/F
   with ZK before      2. Global Indexing           4. Many other
   official release    4. Secondary Index              improvements
4. Secondary Index     5. Multiple DB support       4. Application Solution
5. Support Schema      6. ACL support
6. ACL support         7. MR & Scan I/F
7. SQL console         8. Loader Tools, HT-Client
                       9. Admin Portal
                       10.JDBC remote console




        2008                    2009                          2010
HugeTable Building Blocks
                         Applications



                      HugeTable
                      HugeTable                            …
                                                           …



 Storage    KVStore     SQL-MR      Lock      NMS
Computing
                       Hadoop      Hadoop     Cloud
                                                       …
Hadoop      Hadoop
 Core       HBase       Hive      Zookeeper   Master
HBase as HugeTable Index Store
  Create Index                         Select … using index xxx
  Drop Index                           Select … where idxcol



                      Find Index
 Index Meta Data                           Query Engine



         Find Index                 Read Index


                      Write Index            Index Data
  Load Service
                                               HBase



  HT Loader                                 Check Index
Index Store Implementation

  Primary Index: index into data file
  Secondary Index: index into primary index
  Exact match and Range scan
  Integrated with Hive ql and other modules

  20 Nodes,
  1TB/Node      Hive                 HT-
                                     HT-p1                 HT-p2

  Memory
                No Additional cost   8GB/Node*TB           2GB/Node*TB
  Consumption

                20MB/s·Node(No       2.5MB/s·Node(Primar   >5MB/s·Node(Primary
  Load Speed
                Index)               y Index)              Index)

  Index Query   N/A                  <10 sec               <10 sec
HugeTable IUD Support

Goal: Support Insert, Update and Delete on application data.

                                               IUD Statement Select


                              Find IUD table
         Meta Data                                    Query Engine


                                          Write IUD Data



          HT Data                      IUD Table      Read IUD Data
           HDFS                          HBase



                      Offline Merger
HugeTable Access Control

Goal: Support Multiple Users from Multiple Applications , w/o mutual trust


     Database privileges:                  User Access Level:
     1. Meta Data: Index, Create,          1. System Administrator
        Drop                               2. User Manager
     2. User Data: IUD                     3. User


       Grant/Revoke
                                               DDL/DML         Loader/Portal


                               Check Privileges
         Meta Data                                     ACL Module
Administration Portal

Goal: Unified HugeTable management point, decrease management effort


Data Management    User Management     Monitor & FM        Configuration
DB/TBL/IDX         Add/Delete/Modify   Log/Alert/Service   Deploy/Setup
HugeTable Application API
                        Various kinds of Applications


         JDBC/SQL API                           MapReduce API                              BigTable API


• Migration of traditional database   •   Compatible with Hadoop MR API         •   BigTable/HBase style API
  applications                        •   For data analysis, e.g. data mining   •   For NoSQL application, on HFile2
• For SQL developer                   •   Work with HT records format           •   Range scan, Key-value access
• Batch processing & interactive      •   Access control                        •   Access Control


    Table table = new Table("gdr", "admin", "admin");
public void map(LongWritable key, {"default"};
    String[] families = new String[] HugeRecord value,
           OutputCollector<HugeRecordRowKey, HugeRecord> output,
    String[] partitions = new String[] {"dt=20100317"};
    int limit = 10; reporter);
           Reporter
    TableScannerInterface tsi = table.getScanner(
public void reduce(HugeRecordRowKey key,
                new byte[0],new byte[] {Byte.MAX_VALUE},families, partitions);
           Iterator<HugeRecord> values,
    for (int i=0; i<limit; ++i) {
           OutputCollector<HugeRecordRowKey, HugeRecord> output,
                GroupValue gv = tsi.next();
           Reporter reporter);
      for (String family : families) {
                  System.out.println(family + " = " + Bytes.toString(gv.getByteValue(family)));
                }
    }
HugeTable based Telco Application Solutions
  Heavy Requirements, e.g.
      Batch processing                           Telco App
      Complex data analysis
      Interactive query on CDR
      Statistic and reporting Reporting

                                               Interactive Complex      Interactive
                                              Simple Query Analyze    Complex Query

                                   Database
Data Source                                     HugeTable
                                                 Cluster
                                                                             Data
                        Data                        +                      warehouse
                      Aggregator
                                                DataMing
 Data Source                                     Tool kits


                                                              Mass Data Store
                                                              Batch processing
                                                              Statistic
Future works

 Column Sorage Engine
   File Format
   Compression
   Local Index
   Global Index
 Query Optimization
   Join Optimization: index
 Load Optimization
   Parallel Load
 Application Solution
Thanks for your time!
   China Mobile Research Institute

More Related Content

What's hot (20)

PPTX
Hive vs Hbase, a Friendly Competition
Xplenty
 
PDF
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Jonathan Seidman
 
PPTX
Geo-based content processing using hbase
Ravi Veeramachaneni
 
PPTX
Overview of Big data, Hadoop and Microsoft BI - version1
Thanh Nguyen
 
PDF
Integration of HIve and HBase
Hortonworks
 
PDF
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Cloudera, Inc.
 
PPTX
Hadoop Summit - Hausenblas 20 March
MapR Technologies
 
PDF
How Salesforce.com uses Hadoop
Narayan Bharadwaj
 
PDF
Hadoop for shanghai dev meetup
Roby Chen
 
PDF
Big SQL Competitive Summary - Vendor Landscape
Nicolas Morales
 
PDF
Realtime Analytics with Hadoop and HBase
larsgeorge
 
PDF
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Cloudera, Inc.
 
PPTX
Understanding hdfs
Thirunavukkarasu Ps
 
PDF
Big Data: SQL on Hadoop from IBM
Cynthia Saracco
 
PPTX
Apache Hadoop Now Next and Beyond
DataWorks Summit
 
PDF
Hadoop Overview
EMC
 
PPTX
Impala Unlocks Interactive BI on Hadoop
Cloudera, Inc.
 
PDF
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Jonathan Seidman
 
PPTX
Hadoop as data refinery
Steve Loughran
 
PPTX
Hadoop as Data Refinery - Steve Loughran
JAX London
 
Hive vs Hbase, a Friendly Competition
Xplenty
 
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Jonathan Seidman
 
Geo-based content processing using hbase
Ravi Veeramachaneni
 
Overview of Big data, Hadoop and Microsoft BI - version1
Thanh Nguyen
 
Integration of HIve and HBase
Hortonworks
 
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Cloudera, Inc.
 
Hadoop Summit - Hausenblas 20 March
MapR Technologies
 
How Salesforce.com uses Hadoop
Narayan Bharadwaj
 
Hadoop for shanghai dev meetup
Roby Chen
 
Big SQL Competitive Summary - Vendor Landscape
Nicolas Morales
 
Realtime Analytics with Hadoop and HBase
larsgeorge
 
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Cloudera, Inc.
 
Understanding hdfs
Thirunavukkarasu Ps
 
Big Data: SQL on Hadoop from IBM
Cynthia Saracco
 
Apache Hadoop Now Next and Beyond
DataWorks Summit
 
Hadoop Overview
EMC
 
Impala Unlocks Interactive BI on Hadoop
Cloudera, Inc.
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Jonathan Seidman
 
Hadoop as data refinery
Steve Loughran
 
Hadoop as Data Refinery - Steve Loughran
JAX London
 

Viewers also liked (20)

PDF
Rapport Doing Business 2015
Franck Dasilva
 
DOC
Boards part 4_review
Sharon Epperson
 
PDF
srthsrth
nn2141485
 
PPT
fgfdgdfg
robinson1234
 
PDF
Guj engdictionary
nilay4561
 
PDF
Qiang 羌 references in the book of han 汉书 part 1
qianghistory
 
PDF
Polymer and rubber manufacturing workforce development plan oct 2010
RITCWA
 
PPTX
CBA PP Branded
Peter McMaster QC
 
DOCX
Samuel quero laplace
samuelquero
 
PPT
The Power of BIG OER
Patrick McAndrew
 
PDF
A V I D Juicy Ultimate Brake Bleeding
radicallights
 
PPTX
The Creative Minds: Steps in enhancing your creativity
History Lovr
 
PPTX
I AM HR: FIVE STRATEGIC WAYS TO BREAK STEREOTYPES AND RECLAIM HR
Laurie Ruettimann
 
PPT
Challenging themes: radio English for learners and teachers
Paul Woods
 
PPT
OpenGL ES based UI Development on TI Platforms
Prabindh Sundareson
 
PDF
Managic presentation english
Antonio Roberto Oliveira
 
DOC
Cisco 3900 and cisco 2900 series routers
3Anetwork com
 
PPSX
Quiz for ut iii pps
shajugeorge
 
PPTX
peran pers nasional KWN by Pangestu chaesar
Pangestu S
 
PDF
Gps4b
shacho
 
Rapport Doing Business 2015
Franck Dasilva
 
Boards part 4_review
Sharon Epperson
 
srthsrth
nn2141485
 
fgfdgdfg
robinson1234
 
Guj engdictionary
nilay4561
 
Qiang 羌 references in the book of han 汉书 part 1
qianghistory
 
Polymer and rubber manufacturing workforce development plan oct 2010
RITCWA
 
CBA PP Branded
Peter McMaster QC
 
Samuel quero laplace
samuelquero
 
The Power of BIG OER
Patrick McAndrew
 
A V I D Juicy Ultimate Brake Bleeding
radicallights
 
The Creative Minds: Steps in enhancing your creativity
History Lovr
 
I AM HR: FIVE STRATEGIC WAYS TO BREAK STEREOTYPES AND RECLAIM HR
Laurie Ruettimann
 
Challenging themes: radio English for learners and teachers
Paul Woods
 
OpenGL ES based UI Development on TI Platforms
Prabindh Sundareson
 
Managic presentation english
Antonio Roberto Oliveira
 
Cisco 3900 and cisco 2900 series routers
3Anetwork com
 
Quiz for ut iii pps
shajugeorge
 
peran pers nasional KWN by Pangestu chaesar
Pangestu S
 
Gps4b
shacho
 
Ad

Similar to HugeTable:Application-Oriented Structure Data Storage System (20)

PPT
Eric Baldeschwieler Keynote from Storage Developers Conference
Hortonworks
 
PPTX
Big Data & Analytics (CSE6005) L6.pptx
Anonymous9etQKwW
 
PDF
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Cloudera, Inc.
 
PPTX
Overview of big data & hadoop version 1 - Tony Nguyen
Thanh Nguyen
 
ODP
The power of hadoop in cloud computing
Joey Echeverria
 
PDF
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Cloudera, Inc.
 
PDF
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
Amr Awadallah
 
PDF
What is hadoop
Asis Mohanty
 
PPT
Data Science Day New York: The Platform for Big Data
Cloudera, Inc.
 
PPTX
HBaseConAsia2018 Track3-2: HBase at China Telecom
Michael Stack
 
PDF
Keynote from ApacheCon NA 2011
Hortonworks
 
PDF
Getting started with big data in Azure HDInsight
Nilesh Gule
 
PDF
Hadoop - Now, Next and Beyond
Teradata Aster
 
PPTX
Big data and hadoop product page
Janu Jahnavi
 
PPTX
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
amrutupre
 
PDF
Apache Tajo - An open source big data warehouse
hadoopsphere
 
PDF
Big Data/Hadoop Infrastructure Considerations
Richard McDougall
 
PDF
BIGDATA ppts
Krisshhna Daasaarii
 
PDF
Hortonworks and Red Hat Webinar - Part 2
Hortonworks
 
PDF
Hadoop Data Reservoir Webinar
Platfora
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Hortonworks
 
Big Data & Analytics (CSE6005) L6.pptx
Anonymous9etQKwW
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Cloudera, Inc.
 
Overview of big data & hadoop version 1 - Tony Nguyen
Thanh Nguyen
 
The power of hadoop in cloud computing
Joey Echeverria
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Cloudera, Inc.
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
Amr Awadallah
 
What is hadoop
Asis Mohanty
 
Data Science Day New York: The Platform for Big Data
Cloudera, Inc.
 
HBaseConAsia2018 Track3-2: HBase at China Telecom
Michael Stack
 
Keynote from ApacheCon NA 2011
Hortonworks
 
Getting started with big data in Azure HDInsight
Nilesh Gule
 
Hadoop - Now, Next and Beyond
Teradata Aster
 
Big data and hadoop product page
Janu Jahnavi
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
amrutupre
 
Apache Tajo - An open source big data warehouse
hadoopsphere
 
Big Data/Hadoop Infrastructure Considerations
Richard McDougall
 
BIGDATA ppts
Krisshhna Daasaarii
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks
 
Hadoop Data Reservoir Webinar
Platfora
 
Ad

Recently uploaded (20)

PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 

HugeTable:Application-Oriented Structure Data Storage System

  • 1. HugeTable:Application-Oriented Structure Data Storage System China Mobile Research Institute HugeTable Project Team Qian Ling
  • 2. Agenda Motivations Hadoop, Hive & HBase HT Design & Development HT Applications Further Plans
  • 3. Motivations Huge Data Volumes Total data volumes: Several PB per system Daily data volumes: Several TB per system Longer retention period: several months Big potential: 200% increase in some area Multiple Applications Areas Data Warehouse •Scalable BOSS BI NMS Internet ... •High Available Data Integration •Reliable Traditional Application Model + App Solution SQL support Fast Index Query … Affordable Multiple Application support Sensitive data CRUD support Statistic & Reporting
  • 4. Hadoop: Raw Techniques HDFS: distributed file system with fault tolerance MapReduce: parallel programming environments over HDFS Similar to the situation of POSIX API + Local FS High Level Toolkits are initiated Yahoo: PIG/Latin Business.com: Cloudbase/Hadoop+JDBC China Mobile: BC-PDM Facebook: Hive/SQL
  • 5. Hive: A Petabytes Scale Data Warehouse Features: • Schema support • Pluggable Storage Engine I/F • SQL MR translation • xDBC Driver • Tools: HQL Console • Admin: HWI Usage Scenarios • Reporting • Ad hoc Analysis • Machine Learning • Others •Log analysis •Trend detection Facebook has huge clusters >1000 nodes Source: ICDE 2010/Facebook
  • 6. HBase: structured storage of sparse data for Hadoop Features • ColumnFamilies • ACID • Optimized R/W • BigTable I/F + BU • Tools: HBase Shell • Admin: Jetty Based Usage Scenarios • Social Service • MapReduce Analysis • Content Repository • Wiki, RSS • Near Realtime Reporting Source: ApacheCon2009/ HBase & analytics • Store web pages … Replacing SQL Systems
  • 7. HugeTable: Application-Oriented Structure Data Storage System Address the missing blocks HugeTable Index store & Query Optimizer Tools Client I/F Admin s Data, Access Control List HFile w/ Index config, FM, Log, Insert, Update and Delete CF Store Perf Web-based Administration Build Solutions for Telco Applications Network Management System – NMS Value-added System – VAS Business Intelligence – BI Other areas
  • 8. A Brief History of HugeTable HT-p1 HT-p2 HT-p3 1. Connect Hive with 1. Move to higher version 1. HBase-based HBase of Hive, Hadoop and 2. Partial xDBC/SQL HBase support 3. Support HFile, CF in 2. New Storage Engine 3. Integration HBase Hive 3. Fruitful external I/F with ZK before 2. Global Indexing 4. Many other official release 4. Secondary Index improvements 4. Secondary Index 5. Multiple DB support 4. Application Solution 5. Support Schema 6. ACL support 6. ACL support 7. MR & Scan I/F 7. SQL console 8. Loader Tools, HT-Client 9. Admin Portal 10.JDBC remote console 2008 2009 2010
  • 9. HugeTable Building Blocks Applications HugeTable HugeTable … … Storage KVStore SQL-MR Lock NMS Computing Hadoop Hadoop Cloud … Hadoop Hadoop Core HBase Hive Zookeeper Master
  • 10. HBase as HugeTable Index Store Create Index Select … using index xxx Drop Index Select … where idxcol Find Index Index Meta Data Query Engine Find Index Read Index Write Index Index Data Load Service HBase HT Loader Check Index
  • 11. Index Store Implementation Primary Index: index into data file Secondary Index: index into primary index Exact match and Range scan Integrated with Hive ql and other modules 20 Nodes, 1TB/Node Hive HT- HT-p1 HT-p2 Memory No Additional cost 8GB/Node*TB 2GB/Node*TB Consumption 20MB/s·Node(No 2.5MB/s·Node(Primar >5MB/s·Node(Primary Load Speed Index) y Index) Index) Index Query N/A <10 sec <10 sec
  • 12. HugeTable IUD Support Goal: Support Insert, Update and Delete on application data. IUD Statement Select Find IUD table Meta Data Query Engine Write IUD Data HT Data IUD Table Read IUD Data HDFS HBase Offline Merger
  • 13. HugeTable Access Control Goal: Support Multiple Users from Multiple Applications , w/o mutual trust Database privileges: User Access Level: 1. Meta Data: Index, Create, 1. System Administrator Drop 2. User Manager 2. User Data: IUD 3. User Grant/Revoke DDL/DML Loader/Portal Check Privileges Meta Data ACL Module
  • 14. Administration Portal Goal: Unified HugeTable management point, decrease management effort Data Management User Management Monitor & FM Configuration DB/TBL/IDX Add/Delete/Modify Log/Alert/Service Deploy/Setup
  • 15. HugeTable Application API Various kinds of Applications JDBC/SQL API MapReduce API BigTable API • Migration of traditional database • Compatible with Hadoop MR API • BigTable/HBase style API applications • For data analysis, e.g. data mining • For NoSQL application, on HFile2 • For SQL developer • Work with HT records format • Range scan, Key-value access • Batch processing & interactive • Access control • Access Control Table table = new Table("gdr", "admin", "admin"); public void map(LongWritable key, {"default"}; String[] families = new String[] HugeRecord value, OutputCollector<HugeRecordRowKey, HugeRecord> output, String[] partitions = new String[] {"dt=20100317"}; int limit = 10; reporter); Reporter TableScannerInterface tsi = table.getScanner( public void reduce(HugeRecordRowKey key, new byte[0],new byte[] {Byte.MAX_VALUE},families, partitions); Iterator<HugeRecord> values, for (int i=0; i<limit; ++i) { OutputCollector<HugeRecordRowKey, HugeRecord> output, GroupValue gv = tsi.next(); Reporter reporter); for (String family : families) { System.out.println(family + " = " + Bytes.toString(gv.getByteValue(family))); } }
  • 16. HugeTable based Telco Application Solutions Heavy Requirements, e.g. Batch processing Telco App Complex data analysis Interactive query on CDR Statistic and reporting Reporting Interactive Complex Interactive Simple Query Analyze Complex Query Database Data Source HugeTable Cluster Data Data + warehouse Aggregator DataMing Data Source Tool kits Mass Data Store Batch processing Statistic
  • 17. Future works Column Sorage Engine File Format Compression Local Index Global Index Query Optimization Join Optimization: index Load Optimization Parallel Load Application Solution
  • 18. Thanks for your time! China Mobile Research Institute