Case Study:
How J.B. Hunt is Driving
Efficiently with AI and Real-
Time Automated Data
Pipelines
November 3, 2020
Presentation featuring presenters from Qlik,
Databricks and J.B. Hunt
2
Ritu Jain
Director,
Product Marketing
Qlik
Meet our Presenters
Nauman Fakhar
Director,
ISV Solutions
Databricks
Joe Spinelli
Director, Engineering &
Technology
J.B. Hunt
J.B. Hunt: Improving
Efficiency with Qlik
and Databricks
Maximize Your Data Lake Value
Nov 2020
Ritu Jain
Director of Product Marketing, Qlik
4
50,000+ customers
100+ countries
2,000+ employees
Market Momentum
Double Digit Growth
Fastest Growing Vendor in
Data Integration Market
Global Ecosystem –
1,700 Partners
Accenture, Deloitte, Cognizant
Microsoft, AWS, DataRobot,
Snowflake, Google
Industry Leader
Gartner Magic Quadrant
for BI & Analytics 10 years
in a row
Who We Are
5
What We Do
We help organizations
turn data into business value.
Actionable
Data
Raw
Data
Actionable
Insights
Business
Value
6
Data Drives Business Value
Yet most struggle…
32% 24%10%
of business
relevant data
is used for analysis
of executives say
they can create
value from data
of business
decision makers
feel data literate
Most organizations struggle to
make analytics-ready data
available, let alone turn it into
business value
7
Data-to-Value Continuum
Capture Ingest Store Process Manage Consume
Business
Impact
$ Value Creation $$$$
“Storing data in a data lake is important, but it is only the beginning.
Making data useable and accessible is critical to value creation.”
8
What’s Holding You Back?
The Data Challenge…
Ingest
Siloed, Multiple
Sources
Data Latency
Store & Process
Cost & Reliability
Time to Analytics-
Readiness
Skills Scarcity
Consume
Search &
Understand
Self-Serve
Accessibility
Data Integrity,
Trust
Governance,
Security
Manage
9
Qlik for Databricks: End-to-End Data Pipeline
Data
Warehouse
Mainframe
SAAS
RDBMS
Files
SAP CDC & Ingest
Load raw data directly into
Delta Lake
Batch Load
CDC
DDL Propagation
Store & Apply
1. Auto-generate Spark-SQL
to merge change deltas
2. Enrich, & transform data
in Databricks
3. Execute in Delta Engine to
generate ODS
Delta Lake Delta Table
Find
Shop Publish
Prepare
AI, ML Analytics
Catalog & Publish
Consume & Analyze
Ingest, Apply, Catalog & Analyze
ML & Score
1. Develop, train and score
ML models
10
Qlik + Databricks
For Maximum Business Value…
Ingest
Siloed, Multiple
Sources
Data Latency
Store & Process
Cost & Reliability
Time to Analytics-
Readiness
Skills Scarcity
Consume
Search &
Understand
Self-Serve
Accessibility
Data Integrity,
Trust
Governance,
Security
Manage
Change Data Capture
Universal Connectivity
Multi-Cloud
Full Automation
Change Propagation,
ACID
Metadata & Lineage
Access Controls
Data Encryption
Search & Evaluate
Shop , Prepare and
Self-Provision
Nauman Fakhar
Director, ISV Solutions
Helping data teams solve
their toughest problems
▪ Global company with over 5,000 customers and 450+ partners
▪ Original creators of popular data and machine learning open source projects
A unified data analytics platform for accelerating innovation across
data engineering, data science, and analytics
Unlocking business value: Four challenges
Data is messy, siloed
and slow
Lack of enterprise
readiness
ML is hard,
Production is harder
Data Scientists
in Business
Data Engineers
in IT
BI is limited to a
fraction of data
11000110001100010001000
10000101110001001010100
00111100101010011111100
11100111010100011100110
00110001100010001000100
00101110001001010100001
111001010
Fragmented
security
Poor
reliability
Disjointed
governance
1 2 3 4
Warehouses
Streams
Lakes
Make all your data ready for analytics and ML
Data is messy, siloed
and slow
Your Existing Data Lake
Open High Quality Fast
BI
Reporting
Machine
Learning
Azure Data Lake
Storage
Amazon S3
Unified Data Service
Build open, reliable, fast data lakes with all your data
Unified
Engine
1
Warehouses
Streams
Lakes
Business Data
Big Data
Applications
ML is hard,
production is harder
Data Science and ML Workspace2
Unify data and ML across the full lifecycle
Tracking
Experimental Staging
DeploymentBuilding Models
Databricks Runtime
for ML
Data
Standardize ML lifecycle from experimentation to production
Parameters Metrics
Models
ProductionBusiness and Big
Data
Data Scientists
in Business
Data Engineers
in IT
BI is limited to a
fraction of data
BI Integrations for data lake3
Enable analytics directly on all your source data
Reports Dashboards Ad hoc data science
Applications Files
All your data with high quality and great performance
Data stores
11000110001100010001000100
00101110001001010100001111
00101010011111100111001110
10100011100110001100011000
10001000100001011100010010
10100001111001010
Lack of enterprise
readiness
Databricks Enterprise Cloud Service4
Leverage cloud native platform for
enterprise grade solution
Your
cloud
account
Your
identity
provider DATA
SCIENTISTS
ML
ENGINEERS
DATA
ANALYSTS
DATA
ENGINEERS
Highly reliable, secure
managed service
Azure
Data Lake
Storage
Amazon
S3
All
your
data
1000’s
of users
Fragmented security
Poor reliability
Disjointed governance
Data Lake for all your data
One platform for every use case
Structured transactional layer
High performance query engine
Databricks Unified Data Analytics Platform
BI Reports &
Dashboards
Data Science
Workspace
Machine Learning
Lifecycle
Structured, Semi-Structured and Unstructured Data
DELTA ENGINE
19
Case Study: J.B. Hunt
Improving efficiency with AI and
real time automated data pipelines
Joe Spinelle
Director, Engineering & Technology
20
Qlik + Databricks
The need for real time pipelines…
EDI
Visibility into
potential problems
happening now
Data Science
Training models regularly
on up to the minute data
Asset Telemetry
The ability to know where
our assets are and act on
problems
Applications
Need to be able to see what is
happening now, without putting
load on production databases
1 2 3 4 5 6
Analytics
Infused analytics require real-
time, actionable data – we
need to do more than
understand the past
Much, much more!
Every facet of our
business can benefit from
fresh data
21
J.B. Hunt
Why create a cloud data lake…
ML
Code
Configuration
Data Collection
Data
Verification
Feature
Extraction
Machine
Resource
Management
Analysis Tools
Process
Management Tools
Serving
Infrastructure
Monitoring
Only a small fraction of real-world ML systems is composed of the ML code, as shown by the small
green box in the middle. The required surrounding infrastructure is vast and complex. Going with a
cloud data lake greatly reduced the time to realize benefits.
22
J.B. Hunt
The efficient pathway to real time data…
We required centralized
monitoring so our team
could handle problems
proactively
Monitoring
A variety of data need to
be contained in the lake,
including mainframe, SQL,
and other sources
Ingestion flexibility
Blob data, Event Hub data,
and other unstructured data
types must be supported
Repository flexibility
Ingestion and replication
needs to be efficient for
our busy team
Automation
23
J.B. Hunt
Improving the way we deliver analytics…
Manually Created
Once a day reload
Operational Apps
Staging Reporting
Warehouse
Manually Created
Once a day reload
Power
BI
Data Insights
Teams
• Mainly supports bolt-on reporting
• Time Intensive
• Results only after entire process
• Highly Manual
• Restricted Access due to
technology limitation
• Changes to source systems
require manual changes
downstream
Most time taken in process
30m
Refresh
Mainframe
24
J.B. Hunt
Improving the way we deliver analytics…
• Supports data science, analytics,
applications
• Near real-time
• Results continuously update
• Highly Automated
• Secured and available
• Changes to source systems
reflected downstream
25
Enterprise Architectures are Complex…
26
Five Key Takeaways for Success
Accelerate
Automate
Data replication with change data capture to convert slow
data into fast data
Entire data pipeline to deliver continuous stream of up-to-date
analytics-ready data for your AI, ML and data science initiatives
Catalog
To generate rich metadata, persist end-to-end lineage, and
ensure data governance and security
Curate
A data marketplace to enable data consumers self-sufficiently
search, understand, evaluate and access data
Future-Proof
Adapt to changing data architecture requirements with a
platform independent solution
27
For more information contact sales@qlik.com or info@databricks.com
Contact Us for a Tech Call
Sales@qlik.com

Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Time Automated Data Pipelines

  • 1.
    Case Study: How J.B.Hunt is Driving Efficiently with AI and Real- Time Automated Data Pipelines November 3, 2020 Presentation featuring presenters from Qlik, Databricks and J.B. Hunt
  • 2.
    2 Ritu Jain Director, Product Marketing Qlik Meetour Presenters Nauman Fakhar Director, ISV Solutions Databricks Joe Spinelli Director, Engineering & Technology J.B. Hunt
  • 3.
    J.B. Hunt: Improving Efficiencywith Qlik and Databricks Maximize Your Data Lake Value Nov 2020 Ritu Jain Director of Product Marketing, Qlik
  • 4.
    4 50,000+ customers 100+ countries 2,000+employees Market Momentum Double Digit Growth Fastest Growing Vendor in Data Integration Market Global Ecosystem – 1,700 Partners Accenture, Deloitte, Cognizant Microsoft, AWS, DataRobot, Snowflake, Google Industry Leader Gartner Magic Quadrant for BI & Analytics 10 years in a row Who We Are
  • 5.
    5 What We Do Wehelp organizations turn data into business value. Actionable Data Raw Data Actionable Insights Business Value
  • 6.
    6 Data Drives BusinessValue Yet most struggle… 32% 24%10% of business relevant data is used for analysis of executives say they can create value from data of business decision makers feel data literate Most organizations struggle to make analytics-ready data available, let alone turn it into business value
  • 7.
    7 Data-to-Value Continuum Capture IngestStore Process Manage Consume Business Impact $ Value Creation $$$$ “Storing data in a data lake is important, but it is only the beginning. Making data useable and accessible is critical to value creation.”
  • 8.
    8 What’s Holding YouBack? The Data Challenge… Ingest Siloed, Multiple Sources Data Latency Store & Process Cost & Reliability Time to Analytics- Readiness Skills Scarcity Consume Search & Understand Self-Serve Accessibility Data Integrity, Trust Governance, Security Manage
  • 9.
    9 Qlik for Databricks:End-to-End Data Pipeline Data Warehouse Mainframe SAAS RDBMS Files SAP CDC & Ingest Load raw data directly into Delta Lake Batch Load CDC DDL Propagation Store & Apply 1. Auto-generate Spark-SQL to merge change deltas 2. Enrich, & transform data in Databricks 3. Execute in Delta Engine to generate ODS Delta Lake Delta Table Find Shop Publish Prepare AI, ML Analytics Catalog & Publish Consume & Analyze Ingest, Apply, Catalog & Analyze ML & Score 1. Develop, train and score ML models
  • 10.
    10 Qlik + Databricks ForMaximum Business Value… Ingest Siloed, Multiple Sources Data Latency Store & Process Cost & Reliability Time to Analytics- Readiness Skills Scarcity Consume Search & Understand Self-Serve Accessibility Data Integrity, Trust Governance, Security Manage Change Data Capture Universal Connectivity Multi-Cloud Full Automation Change Propagation, ACID Metadata & Lineage Access Controls Data Encryption Search & Evaluate Shop , Prepare and Self-Provision
  • 11.
    Nauman Fakhar Director, ISVSolutions Helping data teams solve their toughest problems
  • 12.
    ▪ Global companywith over 5,000 customers and 450+ partners ▪ Original creators of popular data and machine learning open source projects A unified data analytics platform for accelerating innovation across data engineering, data science, and analytics
  • 13.
    Unlocking business value:Four challenges Data is messy, siloed and slow Lack of enterprise readiness ML is hard, Production is harder Data Scientists in Business Data Engineers in IT BI is limited to a fraction of data 11000110001100010001000 10000101110001001010100 00111100101010011111100 11100111010100011100110 00110001100010001000100 00101110001001010100001 111001010 Fragmented security Poor reliability Disjointed governance 1 2 3 4 Warehouses Streams Lakes
  • 14.
    Make all yourdata ready for analytics and ML Data is messy, siloed and slow Your Existing Data Lake Open High Quality Fast BI Reporting Machine Learning Azure Data Lake Storage Amazon S3 Unified Data Service Build open, reliable, fast data lakes with all your data Unified Engine 1 Warehouses Streams Lakes Business Data Big Data Applications
  • 15.
    ML is hard, productionis harder Data Science and ML Workspace2 Unify data and ML across the full lifecycle Tracking Experimental Staging DeploymentBuilding Models Databricks Runtime for ML Data Standardize ML lifecycle from experimentation to production Parameters Metrics Models ProductionBusiness and Big Data Data Scientists in Business Data Engineers in IT
  • 16.
    BI is limitedto a fraction of data BI Integrations for data lake3 Enable analytics directly on all your source data Reports Dashboards Ad hoc data science Applications Files All your data with high quality and great performance Data stores 11000110001100010001000100 00101110001001010100001111 00101010011111100111001110 10100011100110001100011000 10001000100001011100010010 10100001111001010
  • 17.
    Lack of enterprise readiness DatabricksEnterprise Cloud Service4 Leverage cloud native platform for enterprise grade solution Your cloud account Your identity provider DATA SCIENTISTS ML ENGINEERS DATA ANALYSTS DATA ENGINEERS Highly reliable, secure managed service Azure Data Lake Storage Amazon S3 All your data 1000’s of users Fragmented security Poor reliability Disjointed governance
  • 18.
    Data Lake forall your data One platform for every use case Structured transactional layer High performance query engine Databricks Unified Data Analytics Platform BI Reports & Dashboards Data Science Workspace Machine Learning Lifecycle Structured, Semi-Structured and Unstructured Data DELTA ENGINE
  • 19.
    19 Case Study: J.B.Hunt Improving efficiency with AI and real time automated data pipelines Joe Spinelle Director, Engineering & Technology
  • 20.
    20 Qlik + Databricks Theneed for real time pipelines… EDI Visibility into potential problems happening now Data Science Training models regularly on up to the minute data Asset Telemetry The ability to know where our assets are and act on problems Applications Need to be able to see what is happening now, without putting load on production databases 1 2 3 4 5 6 Analytics Infused analytics require real- time, actionable data – we need to do more than understand the past Much, much more! Every facet of our business can benefit from fresh data
  • 21.
    21 J.B. Hunt Why createa cloud data lake… ML Code Configuration Data Collection Data Verification Feature Extraction Machine Resource Management Analysis Tools Process Management Tools Serving Infrastructure Monitoring Only a small fraction of real-world ML systems is composed of the ML code, as shown by the small green box in the middle. The required surrounding infrastructure is vast and complex. Going with a cloud data lake greatly reduced the time to realize benefits.
  • 22.
    22 J.B. Hunt The efficientpathway to real time data… We required centralized monitoring so our team could handle problems proactively Monitoring A variety of data need to be contained in the lake, including mainframe, SQL, and other sources Ingestion flexibility Blob data, Event Hub data, and other unstructured data types must be supported Repository flexibility Ingestion and replication needs to be efficient for our busy team Automation
  • 23.
    23 J.B. Hunt Improving theway we deliver analytics… Manually Created Once a day reload Operational Apps Staging Reporting Warehouse Manually Created Once a day reload Power BI Data Insights Teams • Mainly supports bolt-on reporting • Time Intensive • Results only after entire process • Highly Manual • Restricted Access due to technology limitation • Changes to source systems require manual changes downstream Most time taken in process 30m Refresh Mainframe
  • 24.
    24 J.B. Hunt Improving theway we deliver analytics… • Supports data science, analytics, applications • Near real-time • Results continuously update • Highly Automated • Secured and available • Changes to source systems reflected downstream
  • 25.
  • 26.
    26 Five Key Takeawaysfor Success Accelerate Automate Data replication with change data capture to convert slow data into fast data Entire data pipeline to deliver continuous stream of up-to-date analytics-ready data for your AI, ML and data science initiatives Catalog To generate rich metadata, persist end-to-end lineage, and ensure data governance and security Curate A data marketplace to enable data consumers self-sufficiently search, understand, evaluate and access data Future-Proof Adapt to changing data architecture requirements with a platform independent solution
  • 27.
  • 28.