SlideShare a Scribd company logo
Distilling  insights  @                                      
Arnon Rotem-­‐Gal-­‐Oz
Chief  Data  Officer
Distilling Insights @ Appsflyer (Data Architecture)
Distilling Insights @ Appsflyer (Data Architecture)
Kafka
Columnar Database
(Redshift- evaluating Vertica)
IMDG
(Ignite - evaluating Geode)
Secor
Spark
Aggregations
SparkSQL
(evaluating
Drill,
Presto)
SQL
SQL
Raw
(sequence
files)
DW
(parquet
files)
DM
(Aggregations)
Application
dashboard
Self-serve
BI
(TBD)
Spark
ETL
Spark
Spark
ML
Latest Events
Scoring
exploration
Agg.
logic
Internal
tools
installs clicksinapplaunches
Accounts
Data’s  hierarchy  of  needs*
*With  apologies  to  Maslow
Acted
upon
presented
Distilled
Usable
Accessible
Exist
Exist
Kafka
Columnar Database
(Redshift- evaluating Vertica)
IMDG
(Ignite - evaluating Geode)
Secor
Spark
Aggregations
SparkSQL
(evaluating
Drill,
Presto)
SQL
SQL
Raw
(sequence
files)
DW
(parquet
files)
DM
(Aggregations)
Application
dashboard
Self-serve
BI
(TBD)
Spark
ETL
Spark
Spark
ML
Latest Events
Scoring
exploration
Agg.
logic
Internal
tools
installs clicksinapplaunches
Accounts
Kafka
Columnar Database
(Redshift- evaluating Vertica)
IMDG
(Ignite - evaluating Geode)
Secor
Spark
Aggregations
SparkSQL
(evaluating
Drill,
Presto)
SQL
SQL
Raw
(sequence
files)
DW
(parquet
files)
DM
(Aggregations)
Application
dashboard
Self-serve
BI
(TBD)
Spark
ETL
Spark
Spark
ML
Latest Events
Scoring
exploration
Agg.
logic
Internal
tools
installs clicksinapplaunches
Accounts
Working  off  
of  RAW  data  
“Malting”
Just  slap  SQL  on  everything  
Accessible
Kafka
Columnar Database
(Redshift- evaluating Vertica)
IMDG
(Ignite - evaluating Geode)
Secor
Spark
Aggregations
SparkSQL
(evaluating
Drill,
Presto)
SQL
SQL
Raw
(sequence
files)
DW
(parquet
files)
DM
(Aggregations)
Application
dashboard
Self-serve
BI
(TBD)
Spark
ETL
Spark
Spark
ML
Latest Events
Scoring
exploration
Agg.
logic
Internal
tools
installs clicksinapplaunches
Accounts
Fermenting
Usable
Kafka
Columnar Database
(Redshift- evaluating Vertica)
IMDG
(Ignite - evaluating Geode)
Secor
Spark
Aggregations
SparkSQL
(evaluating
Drill,
Presto)
SQL
SQL
Raw
(sequence
files)
DW
(parquet
files)
DM
(Aggregations)
Application
dashboard
Self-serve
BI
(TBD)
Spark
ETL
Spark
Spark
ML
Latest Events
Scoring
exploration
Agg.
logic
Internal
tools
installs clicksinapplaunches
Accounts
Distilling  
Distilled
Kafka
Columnar Database
(Redshift- evaluating Vertica)
IMDG
(Ignite - evaluating Geode)
Secor
Spark
Aggregations
SparkSQL
(evaluating
Drill,
Presto)
SQL
SQL
Raw
(sequence
files)
DW
(parquet
files)
DM
(Aggregations)
Application
dashboard
Self-serve
BI
(TBD)
Spark
ETL
Spark
Spark
ML
Latest Events
Scoring
exploration
Agg.
logic
Internal
tools
installs clicksinapplaunches
Accounts
RT  insights
Predictive  
Prescriptive
Dashboards
whatnot
presented
Sidetrack:
On  use  of  Spark
Hadoop  &  Mesos
Distilling Insights @ Appsflyer (Data Architecture)
Land  data  in  a  queue
All  data  is  
time-­‐series
Enrich  with  foreign
keys  before  persisting
Analyze  and  
balance  jobs
Not  everything  is  
big  data
We’re  hiring….
jobs@appsflyer.com

More Related Content

PDF
Distilling insights @ AppsFlyer
Arnon Rotem-Gal-Oz
 
PDF
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
Databricks
 
PDF
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Databricks
 
PPTX
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Jeff Magnusson
 
PDF
EDA Meets Data Engineering – What's the Big Deal?
confluent
 
PDF
Hyperspace for Delta Lake
Databricks
 
PDF
AI at Scale
Adi Polak
 
PDF
Keeping Identity Graphs In Sync With Apache Spark
Databricks
 
Distilling insights @ AppsFlyer
Arnon Rotem-Gal-Oz
 
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
Databricks
 
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Databricks
 
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Jeff Magnusson
 
EDA Meets Data Engineering – What's the Big Deal?
confluent
 
Hyperspace for Delta Lake
Databricks
 
AI at Scale
Adi Polak
 
Keeping Identity Graphs In Sync With Apache Spark
Databricks
 

What's hot (20)

PDF
Bridging the Gap Between Datasets and DataFrames
Databricks
 
PDF
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
Spark Summit
 
PDF
Data Warehousing with Spark Streaming at Zalando
Databricks
 
PDF
Cloud Experience: Data-driven Applications Made Simple and Fast
Databricks
 
PPTX
Processing genetic data at scale
Mark Schroering
 
PPTX
Sparkflows Use Cases
Jayant Shekhar
 
PDF
Building the Autodesk Design Graph-(Yotto Koga, Autodesk)
Spark Summit
 
PDF
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
Databricks
 
PDF
Spark Worshop
Juan Pedro Moreno
 
PPTX
Unifying your data management with Hadoop
Jayant Shekhar
 
PDF
Big Data Meets Learning Science: Keynote by Al Essa
Spark Summit
 
PDF
Jump Start into Apache Spark (Seattle Spark Meetup)
Denny Lee
 
PDF
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Databricks
 
PDF
Stream Processing: Choosing the Right Tool for the Job
Databricks
 
PDF
Shifting Data Science into High Gear
Spark Summit
 
PDF
DMM.comラボはなぜSparkを採用したのか?レコメンドエンジン開発の裏側をお話します!
leverages_event
 
PDF
Vectorized R Execution in Apache Spark
Databricks
 
PDF
Field Notes from Expeditions in the Cloud-(Matt Wood, Amazon Web Services)
Spark Summit
 
PDF
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
Spark Summit
 
PDF
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
Databricks
 
Bridging the Gap Between Datasets and DataFrames
Databricks
 
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
Spark Summit
 
Data Warehousing with Spark Streaming at Zalando
Databricks
 
Cloud Experience: Data-driven Applications Made Simple and Fast
Databricks
 
Processing genetic data at scale
Mark Schroering
 
Sparkflows Use Cases
Jayant Shekhar
 
Building the Autodesk Design Graph-(Yotto Koga, Autodesk)
Spark Summit
 
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
Databricks
 
Spark Worshop
Juan Pedro Moreno
 
Unifying your data management with Hadoop
Jayant Shekhar
 
Big Data Meets Learning Science: Keynote by Al Essa
Spark Summit
 
Jump Start into Apache Spark (Seattle Spark Meetup)
Denny Lee
 
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Databricks
 
Stream Processing: Choosing the Right Tool for the Job
Databricks
 
Shifting Data Science into High Gear
Spark Summit
 
DMM.comラボはなぜSparkを採用したのか?レコメンドエンジン開発の裏側をお話します!
leverages_event
 
Vectorized R Execution in Apache Spark
Databricks
 
Field Notes from Expeditions in the Cloud-(Matt Wood, Amazon Web Services)
Spark Summit
 
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
Spark Summit
 
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
Databricks
 
Ad

Similar to Distilling Insights @ Appsflyer (Data Architecture) (20)

PPTX
Building Modern Data Platform with AWS
Dmitry Anoshin
 
PPTX
Big Data Expo 2015 - Pentaho The Future of Analytics
BigDataExpo
 
PDF
Big Data, Ingeniería de datos, y Data Lakes en AWS
javier ramirez
 
PDF
Immersion Day - Como simplificar o acesso ao seu ambiente analítico
Amazon Web Services LATAM
 
PPTX
Data Modernization_Harinath Susairaj.pptx
ArunPandiyan890855
 
PPTX
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Sam Palani
 
PDF
Managing data analytics in a hybrid cloud
Karan Singh
 
PPTX
From raw data to business insights. A modern data lake
javier ramirez
 
PDF
AWSug.nl Data recap Jan 2023
Jacob Verhoeks
 
PPTX
How to Build a Data-Driven Company: From Infrastructure to Insights
Janessa Lantz
 
PPTX
How to Build a Data-Driven Company: From Infrastructure to Insights
Looker
 
PDF
Building Modern Streaming Analytics with Confluent on AWS
confluent
 
PPTX
DATA MINING AND DATA WAREHOUSING TOOLS .pptx
ponmayilkarthik23
 
PPTX
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Dmitry Anoshin
 
PDF
Kickstart your data strategy for 2018: Getting started with Amazon Redshift
Matillion
 
PDF
1 Introduction to Microsoft data platform analytics for release
Jen Stirrup
 
PPTX
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
DataWorks Summit
 
PDF
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
Mark Rittman
 
PPT
O'Reilly Strata: Distilling Data Exhaust
Peter Skomoroch
 
PPTX
Building Modern Data Platform with Microsoft Azure
Dmitry Anoshin
 
Building Modern Data Platform with AWS
Dmitry Anoshin
 
Big Data Expo 2015 - Pentaho The Future of Analytics
BigDataExpo
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
javier ramirez
 
Immersion Day - Como simplificar o acesso ao seu ambiente analítico
Amazon Web Services LATAM
 
Data Modernization_Harinath Susairaj.pptx
ArunPandiyan890855
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Sam Palani
 
Managing data analytics in a hybrid cloud
Karan Singh
 
From raw data to business insights. A modern data lake
javier ramirez
 
AWSug.nl Data recap Jan 2023
Jacob Verhoeks
 
How to Build a Data-Driven Company: From Infrastructure to Insights
Janessa Lantz
 
How to Build a Data-Driven Company: From Infrastructure to Insights
Looker
 
Building Modern Streaming Analytics with Confluent on AWS
confluent
 
DATA MINING AND DATA WAREHOUSING TOOLS .pptx
ponmayilkarthik23
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Dmitry Anoshin
 
Kickstart your data strategy for 2018: Getting started with Amazon Redshift
Matillion
 
1 Introduction to Microsoft data platform analytics for release
Jen Stirrup
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
DataWorks Summit
 
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
Mark Rittman
 
O'Reilly Strata: Distilling Data Exhaust
Peter Skomoroch
 
Building Modern Data Platform with Microsoft Azure
Dmitry Anoshin
 
Ad

More from Arnon Rotem-Gal-Oz (20)

PDF
Coding with AI - Understanding LLMs and how to use them
Arnon Rotem-Gal-Oz
 
PPTX
Taking ML to production - a journey
Arnon Rotem-Gal-Oz
 
PPTX
Apache spark
Arnon Rotem-Gal-Oz
 
PPTX
Fallacies of Distributed Computing
Arnon Rotem-Gal-Oz
 
PPTX
Docker & Kubernetes intro
Arnon Rotem-Gal-Oz
 
PDF
Docker Intro
Arnon Rotem-Gal-Oz
 
PPTX
Data security @ the personal level
Arnon Rotem-Gal-Oz
 
PPTX
Microservices - it's déjà vu all over again
Arnon Rotem-Gal-Oz
 
PPTX
Big data in the cloud - welcome to cost oriented design
Arnon Rotem-Gal-Oz
 
PPTX
Big data Overview
Arnon Rotem-Gal-Oz
 
PPTX
Hadoop YARN overview
Arnon Rotem-Gal-Oz
 
PPTX
REST presentation
Arnon Rotem-Gal-Oz
 
PDF
SOA & Big Data
Arnon Rotem-Gal-Oz
 
PPTX
Why the JVM?
Arnon Rotem-Gal-Oz
 
PDF
Building reliable systems from unreliable components
Arnon Rotem-Gal-Oz
 
PPTX
Azure migration
Arnon Rotem-Gal-Oz
 
PPTX
Things to think about while architecting azure solutions
Arnon Rotem-Gal-Oz
 
Coding with AI - Understanding LLMs and how to use them
Arnon Rotem-Gal-Oz
 
Taking ML to production - a journey
Arnon Rotem-Gal-Oz
 
Apache spark
Arnon Rotem-Gal-Oz
 
Fallacies of Distributed Computing
Arnon Rotem-Gal-Oz
 
Docker & Kubernetes intro
Arnon Rotem-Gal-Oz
 
Docker Intro
Arnon Rotem-Gal-Oz
 
Data security @ the personal level
Arnon Rotem-Gal-Oz
 
Microservices - it's déjà vu all over again
Arnon Rotem-Gal-Oz
 
Big data in the cloud - welcome to cost oriented design
Arnon Rotem-Gal-Oz
 
Big data Overview
Arnon Rotem-Gal-Oz
 
Hadoop YARN overview
Arnon Rotem-Gal-Oz
 
REST presentation
Arnon Rotem-Gal-Oz
 
SOA & Big Data
Arnon Rotem-Gal-Oz
 
Why the JVM?
Arnon Rotem-Gal-Oz
 
Building reliable systems from unreliable components
Arnon Rotem-Gal-Oz
 
Azure migration
Arnon Rotem-Gal-Oz
 
Things to think about while architecting azure solutions
Arnon Rotem-Gal-Oz
 

Recently uploaded (20)

PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PDF
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 

Distilling Insights @ Appsflyer (Data Architecture)