SlideShare a Scribd company logo
Driving Datascience at
scale using Postgresql,
Greenplum and Dataiku
PostgresConf 2019
Nicolas GAKRELIDZ
Partner Solution Architect
Dataiku DSS is:
• Collaborative,
• For all profiles,
• Polyglot,
• Production ready
End-to-end Enterprise AI platform
Dataiku DSS
End-to-end Enterprise AI platform
Dataiku DSS
Supporting the Enterprise AI Journey of
Manufacturing Financial Services
Services Consumer Goods
Technology Consulting
E-Retail Media
Healthcare Travel
Global Presence
A WIDE USER BASE
POWERED BY A STRONG ORGANIZATION
Dataikers
220
BACKED BY MAJOR PARTNERS
Customers
220+
Users
20,000
+ of customers expand
usage after first year
80%
Raised so far
$146M
Customers Across Industries
POWERING INDUSTRY LEADERS
The “Tower of Babel” Effect of Data Projects
The Classic Data Project Silos
Business
Analyst
DATA PREPARATION ML MODELING ML DEPLOYMENT
Data Preparation
Data Science Notebooks
& API Platforms
AutoML
Solutions
Data Scientist
Data Engineer
Bring Business Analysts, Engineers, and Scientists Together
Share a common environment to have an impact
DATA PREPARATION ML MODELING ML DEPLOYMENT
Business
Analyst
Data Engineer
Data Scientist
Single Collaborative, Governable and Auditable Environment
Leverage existing skills
and secure sustained
availability
Maximise usage of most
up-to-date technologies
Extend based on current
and future operating
requirements
Get Results Today, Build for Tomorrow
Future proof your data effort
Use your current
infrastructure and be
ready for tomorrow’s
Bokeh
Fortune 500 Customer Rockets through Acceleration Phase
Customer Testimony
Quarterly Evolution of Dataiku Users
Analytics
Leader
10 Projects Leaders
Scale their team to
deliver
10x Projects / Briefs /
Models / ...
Business
Analyst
500 Business Analysts
Leverage Large and
Complex Data Sources
Independent to Deliver
New Projects Accelerate
by leveraging tools
packaged by Data
Scientists
100 Data
Scientists
Focus On Complex
Data Processing
Deliver Code and
Plugins for Reuse
Data
Scientist
20 Data Engineers
Ensure availability of data
infrastructures
Operationalize, monitor
and maintain data
projects
Data
Engineer
Delivering 1,000s of analysis, insights,
models and optimized business
processes
Enable Self-Service Analytics and Operationalize ML
The Two Key Modes of Data Innovation
SSA
Quick answers to
unformulated questions
Directly by the end-users
Pervasive
Agile and instantaneous
Limited integration
High volume
o16n
Robust solutions to
business challenges
Organization-driven
Focused
Longer term
Fine integration
High value projects
How a Major Software Player Auto-Deploys 12,000 Models
Customer Testimony
Design complex recommendation
engines combining price, content and
demand logics (the final models actually
combine 3 predictive models)
Automatically generate
such recommendation engines based on
each of its seller’s data and data models
Operate models in real time and
update them with no down time, scaling
up on a fully managed platform on top of
Kubernetes An AI-enabled Layer on top of
an an existing product
Powered by Dataiku
Dataiku Customer provides a sales management software platform to 4,000 B2B clients
(including several Fortune 100 companies), and has deployed Dataiku in order to:
Leverage your full stack and skills
Dataiku Solution Overview: Architecture
LINUX SERVER
ON PREMISE OR MANAGED
CLOUD
CENTRALIZED
OR AD-HOC
DATA SOURCES,
DATABASES,
DATA LAKE
AVAILABLE OR SPUN-UP
PROCESSING RESOURCES
Leveraging best
storage and
compute
resources
Dataiku deployment servers for
enterprise grade
operationalization
PRODUCTION
SYSTEMS
Centralized server to
facilitate
access to data, ressources,
Browser
based
interface
VISUAL DEVELOPMENT
COMPLETE
CODING
ENVIRONMENTS
VISUALIZATIO
N
COLLABORATION AND
PROJECT
MANAGEMENT
AUDIT,
MONITORING
AND
SCHEDULING
User/task specific
interaction modes
4 components
Dataiku DSS Public API
Dataiku DSS components
Data Scientist Business Analyst Data Engineer
Machine Learning Model DeploymentData Management
MADlib
In-database
machine learning
Graph
Relationship
Analytics
Greenplum
Integrated and cleansed data,
parallel SQL processing
GPText
Fast index,
search, text
analytics
PostGIS
Location analytics
Enable In-Database Analytics & Operationalized ML
Dataiku & Pivotal® Greenplum’s Value
High-Performance Analytics at Petabyte Scale
▪ Dataiku leverages Pivotal® Greenplum for in-database parallel
processing of complex queries, visual analysis and charts.
Simplify Collaboration across Data Teams
▪ End-to-end project collaboration for data scientists and
engineers
▪ Self-service access to data sources
▪ Visual Development experience for building comprehensive
analytics pipelines
Mature Your Data Analytics Operations
▪ Enable self-service analytics of large datasets stored in
Pivotal® Greenplum
▪ Enforce data governance between roles and teams
▪ Enable comprehensive of machine learning pipelines and
models.
Solution Features
Dataiku & Pivotal® Greenplum’s Value
Dataiku + Postgres and Greenplum (example)
Order
Data
Movements
(if compatible)
Dataiku Datasets:
● Index definitions
● Incremental
SQL push
back: Charts using
SQL
Pushback
…
Storage
©2019 dataiku, Inc. | dataiku.com | contact@dataiku.com | @dataiku

More Related Content

What's hot (20)

PPTX
Databricks for Dummies
Rodney Joyce
 
PPTX
Introduction to Data Engineering
Durga Gadiraju
 
PDF
What is MLOps
Henrik Skogström
 
PDF
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Hortonworks
 
PDF
The A-Z of Data: Introduction to MLOps
DataPhoenix
 
PDF
Summary introduction to data engineering
Novita Sari
 
PPTX
Machine Learning and AI
James Serra
 
PDF
Introdution to Dataops and AIOps (or MLOps)
Adrien Blind
 
PDF
Democratizing Data Quality Through a Centralized Platform
Databricks
 
PPTX
Introduction to Dremio
Dremio Corporation
 
PDF
Data Platform Architecture Principles and Evaluation Criteria
ScyllaDB
 
PDF
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
PPTX
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
James Serra
 
PPTX
Demystifying data engineering
Thang Bui (Bob)
 
PDF
Data Governance Best Practices
DATAVERSITY
 
PDF
From Data Warehouse to Lakehouse
Modern Data Stack France
 
PPTX
Big data architectures and the data lake
James Serra
 
PDF
Data Mesh
Piethein Strengholt
 
PDF
Intro to Delta Lake
Databricks
 
PDF
Moving to Databricks & Delta
Databricks
 
Databricks for Dummies
Rodney Joyce
 
Introduction to Data Engineering
Durga Gadiraju
 
What is MLOps
Henrik Skogström
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Hortonworks
 
The A-Z of Data: Introduction to MLOps
DataPhoenix
 
Summary introduction to data engineering
Novita Sari
 
Machine Learning and AI
James Serra
 
Introdution to Dataops and AIOps (or MLOps)
Adrien Blind
 
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Introduction to Dremio
Dremio Corporation
 
Data Platform Architecture Principles and Evaluation Criteria
ScyllaDB
 
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
James Serra
 
Demystifying data engineering
Thang Bui (Bob)
 
Data Governance Best Practices
DATAVERSITY
 
From Data Warehouse to Lakehouse
Modern Data Stack France
 
Big data architectures and the data lake
James Serra
 
Intro to Delta Lake
Databricks
 
Moving to Databricks & Delta
Databricks
 

Similar to Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenplum Summit 2019 (20)

PPTX
Microsoft cloud big data strategy
James Serra
 
PDF
Paris FOD Meetup #5 Cognizant Presentation
Abdelkrim Hadjidj
 
PPTX
Big Data: It’s all about the Use Cases
James Serra
 
PPTX
Using Visualization to Succeed with Big Data
Pactera_US
 
PDF
New Delhi Cloud Summit 05 26-11
Dileep Bhandarkar
 
PDF
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Denodo
 
PPTX
Opportunity: Data, Analytic & Azure
Abhimanyu Singhal
 
PPTX
Digital Business Transformation in the Streaming Era
Attunity
 
PPTX
Wim Stoop, MSc. (Cloudera Data Platform)
Agenda Europe 2035
 
PDF
Revolution in Business Analytics-Zika Virus Example
Bardess Group
 
PDF
SIMPosium presentation_Bardess Qlik
Bardess Group
 
PPTX
Business Discovery PPT
pdalalau
 
PPTX
Business Discovery
Remco van Trigt
 
PPTX
Business Discovery Ppt
Trevor Tucker
 
PPTX
SPS Vancouver 2018 - What is CDM and CDS
Nicolas Georgeault
 
PDF
Cloudera and Qlik: Big Data Analytics for Business
Data IQ Argentina
 
PPTX
Microsoft Fabric Introduction
James Serra
 
PDF
PROG_UntoldStory ISV eBook_0706c FINAL
SolarWinds MSP
 
PDF
About CDAP
Cask Data
 
PDF
Rumos-MDD-Step Into Power Platform Presentation
pimentelegi
 
Microsoft cloud big data strategy
James Serra
 
Paris FOD Meetup #5 Cognizant Presentation
Abdelkrim Hadjidj
 
Big Data: It’s all about the Use Cases
James Serra
 
Using Visualization to Succeed with Big Data
Pactera_US
 
New Delhi Cloud Summit 05 26-11
Dileep Bhandarkar
 
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Denodo
 
Opportunity: Data, Analytic & Azure
Abhimanyu Singhal
 
Digital Business Transformation in the Streaming Era
Attunity
 
Wim Stoop, MSc. (Cloudera Data Platform)
Agenda Europe 2035
 
Revolution in Business Analytics-Zika Virus Example
Bardess Group
 
SIMPosium presentation_Bardess Qlik
Bardess Group
 
Business Discovery PPT
pdalalau
 
Business Discovery
Remco van Trigt
 
Business Discovery Ppt
Trevor Tucker
 
SPS Vancouver 2018 - What is CDM and CDS
Nicolas Georgeault
 
Cloudera and Qlik: Big Data Analytics for Business
Data IQ Argentina
 
Microsoft Fabric Introduction
James Serra
 
PROG_UntoldStory ISV eBook_0706c FINAL
SolarWinds MSP
 
About CDAP
Cask Data
 
Rumos-MDD-Step Into Power Platform Presentation
pimentelegi
 
Ad

More from VMware Tanzu (20)

PDF
Spring into AI presented by Dan Vega 5/14
VMware Tanzu
 
PDF
What AI Means For Your Product Strategy And What To Do About It
VMware Tanzu
 
PDF
Make the Right Thing the Obvious Thing at Cardinal Health 2023
VMware Tanzu
 
PPTX
Enhancing DevEx and Simplifying Operations at Scale
VMware Tanzu
 
PDF
Spring Update | July 2023
VMware Tanzu
 
PPTX
Platforms, Platform Engineering, & Platform as a Product
VMware Tanzu
 
PPTX
Building Cloud Ready Apps
VMware Tanzu
 
PDF
Spring Boot 3 And Beyond
VMware Tanzu
 
PDF
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
VMware Tanzu
 
PDF
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
VMware Tanzu
 
PDF
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
VMware Tanzu
 
PPTX
tanzu_developer_connect.pptx
VMware Tanzu
 
PDF
Tanzu Virtual Developer Connect Workshop - French
VMware Tanzu
 
PDF
Tanzu Developer Connect Workshop - English
VMware Tanzu
 
PDF
Virtual Developer Connect Workshop - English
VMware Tanzu
 
PDF
Tanzu Developer Connect - French
VMware Tanzu
 
PDF
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
VMware Tanzu
 
PDF
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
VMware Tanzu
 
PDF
SpringOne Tour: The Influential Software Engineer
VMware Tanzu
 
PDF
SpringOne Tour: Domain-Driven Design: Theory vs Practice
VMware Tanzu
 
Spring into AI presented by Dan Vega 5/14
VMware Tanzu
 
What AI Means For Your Product Strategy And What To Do About It
VMware Tanzu
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
VMware Tanzu
 
Enhancing DevEx and Simplifying Operations at Scale
VMware Tanzu
 
Spring Update | July 2023
VMware Tanzu
 
Platforms, Platform Engineering, & Platform as a Product
VMware Tanzu
 
Building Cloud Ready Apps
VMware Tanzu
 
Spring Boot 3 And Beyond
VMware Tanzu
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
VMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
VMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
VMware Tanzu
 
tanzu_developer_connect.pptx
VMware Tanzu
 
Tanzu Virtual Developer Connect Workshop - French
VMware Tanzu
 
Tanzu Developer Connect Workshop - English
VMware Tanzu
 
Virtual Developer Connect Workshop - English
VMware Tanzu
 
Tanzu Developer Connect - French
VMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
VMware Tanzu
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
VMware Tanzu
 
SpringOne Tour: The Influential Software Engineer
VMware Tanzu
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
VMware Tanzu
 
Ad

Recently uploaded (20)

PDF
IObit Driver Booster Pro 12.4.0.585 Crack Free Download
henryc1122g
 
PPTX
UI5con_2025_Accessibility_Ever_Evolving_
gerganakremenska1
 
PDF
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
PDF
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
PDF
Simplify React app login with asgardeo-sdk
vaibhav289687
 
PDF
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
PDF
MiniTool Power Data Recovery 8.8 With Crack New Latest 2025
bashirkhan333g
 
PDF
Is Framer the Future of AI Powered No-Code Development?
Isla Pandora
 
PDF
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
PDF
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
PDF
intro_to_cpp_namespace_robotics_corner.pdf
MohamedSaied877003
 
PDF
AI Prompts Cheat Code prompt engineering
Avijit Kumar Roy
 
PPTX
BB FlashBack Pro 5.61.0.4843 With Crack Free Download
cracked shares
 
PDF
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
PPTX
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
PDF
Technical-Careers-Roadmap-in-Software-Market.pdf
Hussein Ali
 
PDF
Dipole Tech Innovations – Global IT Solutions for Business Growth
dipoletechi3
 
PDF
Why is partnering with a SaaS development company crucial for enterprise succ...
Nextbrain Technologies
 
PDF
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
PPTX
Build a Custom Agent for Agentic Testing.pptx
klpathrudu
 
IObit Driver Booster Pro 12.4.0.585 Crack Free Download
henryc1122g
 
UI5con_2025_Accessibility_Ever_Evolving_
gerganakremenska1
 
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
Simplify React app login with asgardeo-sdk
vaibhav289687
 
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
MiniTool Power Data Recovery 8.8 With Crack New Latest 2025
bashirkhan333g
 
Is Framer the Future of AI Powered No-Code Development?
Isla Pandora
 
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
intro_to_cpp_namespace_robotics_corner.pdf
MohamedSaied877003
 
AI Prompts Cheat Code prompt engineering
Avijit Kumar Roy
 
BB FlashBack Pro 5.61.0.4843 With Crack Free Download
cracked shares
 
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
Technical-Careers-Roadmap-in-Software-Market.pdf
Hussein Ali
 
Dipole Tech Innovations – Global IT Solutions for Business Growth
dipoletechi3
 
Why is partnering with a SaaS development company crucial for enterprise succ...
Nextbrain Technologies
 
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
Build a Custom Agent for Agentic Testing.pptx
klpathrudu
 

Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenplum Summit 2019

  • 1. Driving Datascience at scale using Postgresql, Greenplum and Dataiku PostgresConf 2019 Nicolas GAKRELIDZ Partner Solution Architect
  • 2. Dataiku DSS is: • Collaborative, • For all profiles, • Polyglot, • Production ready End-to-end Enterprise AI platform Dataiku DSS
  • 3. End-to-end Enterprise AI platform Dataiku DSS
  • 4. Supporting the Enterprise AI Journey of Manufacturing Financial Services Services Consumer Goods Technology Consulting E-Retail Media Healthcare Travel Global Presence A WIDE USER BASE POWERED BY A STRONG ORGANIZATION Dataikers 220 BACKED BY MAJOR PARTNERS Customers 220+ Users 20,000 + of customers expand usage after first year 80% Raised so far $146M Customers Across Industries POWERING INDUSTRY LEADERS
  • 5. The “Tower of Babel” Effect of Data Projects The Classic Data Project Silos Business Analyst DATA PREPARATION ML MODELING ML DEPLOYMENT Data Preparation Data Science Notebooks & API Platforms AutoML Solutions Data Scientist Data Engineer
  • 6. Bring Business Analysts, Engineers, and Scientists Together Share a common environment to have an impact DATA PREPARATION ML MODELING ML DEPLOYMENT Business Analyst Data Engineer Data Scientist Single Collaborative, Governable and Auditable Environment
  • 7. Leverage existing skills and secure sustained availability Maximise usage of most up-to-date technologies Extend based on current and future operating requirements Get Results Today, Build for Tomorrow Future proof your data effort Use your current infrastructure and be ready for tomorrow’s Bokeh
  • 8. Fortune 500 Customer Rockets through Acceleration Phase Customer Testimony Quarterly Evolution of Dataiku Users Analytics Leader 10 Projects Leaders Scale their team to deliver 10x Projects / Briefs / Models / ... Business Analyst 500 Business Analysts Leverage Large and Complex Data Sources Independent to Deliver New Projects Accelerate by leveraging tools packaged by Data Scientists 100 Data Scientists Focus On Complex Data Processing Deliver Code and Plugins for Reuse Data Scientist 20 Data Engineers Ensure availability of data infrastructures Operationalize, monitor and maintain data projects Data Engineer Delivering 1,000s of analysis, insights, models and optimized business processes
  • 9. Enable Self-Service Analytics and Operationalize ML The Two Key Modes of Data Innovation SSA Quick answers to unformulated questions Directly by the end-users Pervasive Agile and instantaneous Limited integration High volume o16n Robust solutions to business challenges Organization-driven Focused Longer term Fine integration High value projects
  • 10. How a Major Software Player Auto-Deploys 12,000 Models Customer Testimony Design complex recommendation engines combining price, content and demand logics (the final models actually combine 3 predictive models) Automatically generate such recommendation engines based on each of its seller’s data and data models Operate models in real time and update them with no down time, scaling up on a fully managed platform on top of Kubernetes An AI-enabled Layer on top of an an existing product Powered by Dataiku Dataiku Customer provides a sales management software platform to 4,000 B2B clients (including several Fortune 100 companies), and has deployed Dataiku in order to:
  • 11. Leverage your full stack and skills Dataiku Solution Overview: Architecture LINUX SERVER ON PREMISE OR MANAGED CLOUD CENTRALIZED OR AD-HOC DATA SOURCES, DATABASES, DATA LAKE AVAILABLE OR SPUN-UP PROCESSING RESOURCES Leveraging best storage and compute resources Dataiku deployment servers for enterprise grade operationalization PRODUCTION SYSTEMS Centralized server to facilitate access to data, ressources, Browser based interface VISUAL DEVELOPMENT COMPLETE CODING ENVIRONMENTS VISUALIZATIO N COLLABORATION AND PROJECT MANAGEMENT AUDIT, MONITORING AND SCHEDULING User/task specific interaction modes
  • 12. 4 components Dataiku DSS Public API Dataiku DSS components
  • 13. Data Scientist Business Analyst Data Engineer Machine Learning Model DeploymentData Management MADlib In-database machine learning Graph Relationship Analytics Greenplum Integrated and cleansed data, parallel SQL processing GPText Fast index, search, text analytics PostGIS Location analytics Enable In-Database Analytics & Operationalized ML Dataiku & Pivotal® Greenplum’s Value
  • 14. High-Performance Analytics at Petabyte Scale ▪ Dataiku leverages Pivotal® Greenplum for in-database parallel processing of complex queries, visual analysis and charts. Simplify Collaboration across Data Teams ▪ End-to-end project collaboration for data scientists and engineers ▪ Self-service access to data sources ▪ Visual Development experience for building comprehensive analytics pipelines Mature Your Data Analytics Operations ▪ Enable self-service analytics of large datasets stored in Pivotal® Greenplum ▪ Enforce data governance between roles and teams ▪ Enable comprehensive of machine learning pipelines and models. Solution Features Dataiku & Pivotal® Greenplum’s Value
  • 15. Dataiku + Postgres and Greenplum (example) Order Data Movements (if compatible) Dataiku Datasets: ● Index definitions ● Incremental SQL push back: Charts using SQL Pushback … Storage
  • 16. ©2019 dataiku, Inc. | dataiku.com | [email protected] | @dataiku